11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

HMM Based TTS for Mixed Language Text

Zhiwei Shuang (1), Shiyin Kang (2), Yong Qin (3), Lirong Dai (1), Lianhong Cai (2)

(1) University of Science & Technology of China, China
(2) Tsinghua University, China
(3) IBM Research, China

In current text content especially web contents, there are many mixed language contents, i.e. Mandarin text mixed with English words. To make the synthesized speech of mixed language contents sound natural, we need to synthesize the mixed languages content with a single voice. However, this task is very challenging because we can hardly find a talent who can speak both languages well enough. The synthesized speech will sound unnatural if the HMM based TTS is directly built with the non-native speakersí training corpus. In this paper, we propose to use speaker adaptation technology to leverage the native speakerís data to generate more natural speech for the non-native speaker. Evaluation results show that the proposed method can significantly improve the speaker consistency and naturalness of synthesized speech for mixed language text.

Full Paper

Bibliographic reference.  Shuang, Zhiwei / Kang, Shiyin / Qin, Yong / Dai, Lirong / Cai, Lianhong (2010): "HMM based TTS for mixed language text", In INTERSPEECH-2010, 618-621.