10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Syllable HMM Based Mandarin TTS and Comparison with Concatenative TTS

Zhiwei Shuang (1), Shiyin Kang (2), Qin Shi (1), Yong Qin (1), Lianhong Cai (2)

(1) IBM China Research Lab, China
(2) Tsinghua University, China

This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results show that the Syllable HMM based Mandarin TTS system with a 5.3MBs model size can achieve an overall quality close to a concatenative TTS system with 1GB data size.

Full Paper

Bibliographic reference.  Shuang, Zhiwei / Kang, Shiyin / Shi, Qin / Qin, Yong / Cai, Lianhong (2009): "Syllable HMM based Mandarin TTS and comparison with concatenative TTS", In INTERSPEECH-2009, 1767-1770.