EUROSPEECH 2003 - INTERSPEECH 2003
This paper presents a preliminary study in implementing HMM-based Mandarin speech synthesis system, whose main advantage exists in generating various voices. A variety of acoustic unit representations for Mandarin are compared to select an optimal acoustic model set. Syllabic vs. sub-syllabic, context-independent vs. context-dependent, toneless vs. tonal, initial-final vs. preme-toneme models, and models with various numbers of states, are investigated respectively. To take the most advantage of HMM-based speech synthesis, some aspects affecting speaker adaptation quality, especially the selection of adaptation data size, are also studied.
Bibliographic reference. Gu, Wentao / Hirose, Keikichi (2003): "Acoustic model selection and voice quality assessment for HMM-based Mandarin speech synthesis", In EUROSPEECH-2003, 2457-2460.