INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speaker Adaptation Using a Parallel Phone Set Pronunciation Dictionary for Thai-English Bilingual TTS

Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Ausdang Thangthai, Chai Wutiwiwatchai

NECTEC, Thailand

This paper develops a bilingual Thai-English TTS system from two monolingual HMM-based TTS systems. An English Nagoya HMM-based TTS system (HTS) provides correct pronunciations of English words but the voice is different from the voice in a Thai HTS system. We apply a CSMAPLR adaptation technique to make the English voice sounds more similar to the Thai voice. To overcome a phone mapping problem normally occurs with a pair of languages that have dissimilar phone sets, we utilize a cross-language pronunciation mapping through a parallel phone set pronunciation dictionary. The results from the subjective listening test show that English words synthesized by our proposed system are more intelligible (with 0.61 higher MOS) than the existing bilingual Thai-English TTS. Moreover, with the proposed adaptation method, the synthesized English words sound more similar to synthesized Thai words.

Full Paper

Bibliographic reference.  Rugchatjaroen, Anocha / Thatphithakkul, Nattanun / Chotimongkol, Ananlada / Thangthai, Ausdang / Wutiwiwatchai, Chai (2009): "Speaker adaptation using a parallel phone set pronunciation dictionary for Thai-English bilingual TTS", In INTERSPEECH-2009, 1795-1798.