The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis
Most studies on Mandarin HTS (HMM-based text-to-speech system) have taken the initial/final as the basic acoustic units. It is, however, challenging to develop a multilingual HTS in a uniformed and consistent way since most of other languages use the phoneme as the basic phonetic unit. It becomes hard to apply cross-lingual adaptation which need map phonemes from each other, particularly in the case of unified ASR and HTS system due to the phoneme nature of most of the ASR systems. In this paper, we propose a phoneme based Mandarin HTS system, which has been systematically evaluated by comparing it with the initial/final system. The experimental results show that the use of phoneme as the acoustic unit for Mandarin HTS is a promising unified approach, thus enabling better and more uniform development with other languages while significantly reducing the number of acoustic units. The flat-start training scheme is also evaluated to show that the phoneme segmentation problem is solved without any performance degradation for phoneme based Mandarin HTS system. This performs an automatic approach without dependency with particular ASR system.
Index Terms: speech synthesis, Mandarin HTS, flat-start training, speaker adaptation
Bibliographic reference. Guan, Yong / Tian, Jilei / Wu, Yi-Jian / Yamagishi, Junichi / Nurminen, Jani (2010): "An unified and automatic approach of Mandarin HTS system", In SSW7-2010, 236-239.