Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

A Taiwanese (Min-nan) Text-to-Speech (TTS) System Based on Automatically Generated Synthetic Units

Ren-yuan Lyu (1), Zhen-hong Fu (1), Yuang-chin Chiang (2), Hui-mei Liu (2)

(1) Dept. of Electrical Engineering, Chang Gung University, Taoyuan, Taiwan
(2) Inst. of Statistics, Tsing Hua University, Hsin-chu, Taiwan

A Taiwanese (Min-nan) Text-to-Speech (TTS) system has been constructed in this paper based on automatically generated synthetic units by considering several specific phonetic and linguistic characteristics of Taiwanese. Some basic facts about Taiwanese useful in a TTS system is summarized, including the issues of tone sandhi, the writen format and the others. Three functional modules, namely a text analysis module, a prosody module, and a waveform synthesis modules is described sequentially. The synthetic units in the waveform synthesis module come from 2 sources, i.e., (1) a set of isolated-uttered tonal syllables and (2) a set of designed continuous speech corpus. A HMM-based large vocabulary Taiwanese speech recognizer is used to do the forced alignment for the speech corpus. A 85.17% segmentation consistency rate within 20 ms can be achieved.

Full Paper

Bibliographic reference.  Lyu, Ren-yuan / Fu, Zhen-hong / Chiang, Yuang-chin / Liu, Hui-mei (2000): "A Taiwanese (min-nan) text-to-speech (TTS) system based on automatically generated synthetic units", In ICSLP-2000, vol.2, 399-402.