This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results show that the Syllable HMM based Mandarin TTS system with a 5.3MB’s model size can achieve an overall quality close to a concatenative TTS system with 1GB’ data size.
Bibliographic reference. Shuang, Zhiwei / Kang, Shiyin / Shi, Qin / Qin, Yong / Cai, Lianhong (2009): "Syllable HMM based Mandarin TTS and comparison with concatenative TTS", In INTERSPEECH-2009, 1767-1770.