The Voice Conversion Challenge 2016

Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi


This paper describes the Voice Conversion Challenge 2016 devised by the authors to better understand different voice conversion (VC) techniques by comparing their performance on a common dataset. The task of the challenge was speaker conversion, i.e., to transform the voice identity of a source speaker into that of a target speaker while preserving the linguistic content. Using a common dataset consisting of 162 utterances for training and 54 utterances for evaluation from each of 5 source and 5 target speakers, 17 groups working in VC around the world developed their own VC systems for every combination of the source and target speakers, i.e., 25 systems in total, and generated voice samples converted by the developed systems. These samples were evaluated in terms of target speaker similarity and naturalness by 200 listeners in a controlled environment. This paper summarizes the design of the challenge, its result, and a future plan to share views about unsolved problems and challenges faced by the current VC techniques.


DOI: 10.21437/Interspeech.2016-1066

Cite as

Toda, T., Chen, L., Saito, D., Villavicencio, F., Wester, M., Wu, Z., Yamagishi, J. (2016) The Voice Conversion Challenge 2016. Proc. Interspeech 2016, 1632-1636.

Bibtex
@inproceedings{Toda+2016,
author={Tomoki Toda and Ling-Hui Chen and Daisuke Saito and Fernando Villavicencio and Mirjam Wester and Zhizheng Wu and Junichi Yamagishi},
title={The Voice Conversion Challenge 2016},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1066},
url={http://dx.doi.org/10.21437/Interspeech.2016-1066},
pages={1632--1636}
}