12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation

Nobuhiko Hattori (1), Tomoki Toda (1), Hisashi Kawai (2), Hiroshi Saruwatari (1), Kiyohiro Shikano (1)

(1) NAIST, Japan
(2) NICT, Japan

This paper describes a novel approach based on voice conversion (VC) to speaker-adaptive speech synthesis for speech-to-speech translation. Voice quality of translated speech in an output language is usually different from that of an input speaker of the translation system since a text-to-speech system is developed with another speaker's voices in the output language. To render the input speaker's voice quality in the translated speech, we propose a voice quality control method based on one-to-many eigenvoice conversion (EVC) and language-dependent prosodic conversion. Spectral parameters of the translated speech are effectively converted by one-to-many EVC enabling unsupervised speaker adaptation. Moreover, prosodic parameters are modified considering their global differences between the input and output languages. The effectiveness of the proposed method is confirmed by experimental evaluations on cross-lingual VC among Japanese, English, and Chinese.

Full Paper

Bibliographic reference.  Hattori, Nobuhiko / Toda, Tomoki / Kawai, Hisashi / Saruwatari, Hiroshi / Shikano, Kiyohiro (2011): "Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation", In INTERSPEECH-2011, 2769-2772.