This paper describes a novel approach based on voice conversion (VC) to speaker-adaptive speech synthesis for speech-to-speech translation. Voice quality of translated speech in an output language is usually different from that of an input speaker of the translation system since a text-to-speech system is developed with another speaker's voices in the output language. To render the input speaker's voice quality in the translated speech, we propose a voice quality control method based on one-to-many eigenvoice conversion (EVC) and language-dependent prosodic conversion. Spectral parameters of the translated speech are effectively converted by one-to-many EVC enabling unsupervised speaker adaptation. Moreover, prosodic parameters are modified considering their global differences between the input and output languages. The effectiveness of the proposed method is confirmed by experimental evaluations on cross-lingual VC among Japanese, English, and Chinese.
Bibliographic reference. Hattori, Nobuhiko / Toda, Tomoki / Kawai, Hisashi / Saruwatari, Hiroshi / Shikano, Kiyohiro (2011): "Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation", In INTERSPEECH-2011, 2769-2772.