In current text content especially web contents, there are many mixed language contents, i.e. Mandarin text mixed with English words. To make the synthesized speech of mixed language contents sound natural, we need to synthesize the mixed languages content with a single voice. However, this task is very challenging because we can hardly find a talent who can speak both languages well enough. The synthesized speech will sound unnatural if the HMM based TTS is directly built with the non-native speakersí training corpus. In this paper, we propose to use speaker adaptation technology to leverage the native speakerís data to generate more natural speech for the non-native speaker. Evaluation results show that the proposed method can significantly improve the speaker consistency and naturalness of synthesized speech for mixed language text.
Bibliographic reference. Shuang, Zhiwei / Kang, Shiyin / Qin, Yong / Dai, Lirong / Cai, Lianhong (2010): "HMM based TTS for mixed language text", In INTERSPEECH-2010, 618-621.