Voice conversion is a technique for producing utterances using any target speakers' voice from a single source speaker's utterance. In this paper, we apply cross-language voice conversion between Japanese and English to a system based on a Gaussian Mixture Model (GMM) method and STRAIGHT, a high quality vocoder. To investigate the effects of this conversion system across different languages, we recorded two sets of bilingual utterances and performed voice conversion experiments using a mapping function which converts parameters of acoustic features for a source speaker to those of a target speaker. The mapping functions were trained using bilingual databases of both Japanese and English speech. In an objective evaluation using Mel cepstrum distortion (Mel CD), it was confirmed that the system can perform cross-language voice conversion with the same performance as that within a single-language.
Cite as: Mashimo, M., Toda, T., Shikano, K., Campbell, N. (2001) Evaluation of cross-language voice conversion based on GMM and straight. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 361-364, doi: 10.21437/Eurospeech.2001-111
@inproceedings{mashimo01_eurospeech, author={Mikiko Mashimo and Tomoki Toda and Kiyohiro Shikano and Nick Campbell}, title={{Evaluation of cross-language voice conversion based on GMM and straight}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={361--364}, doi={10.21437/Eurospeech.2001-111} }