![]() |
Sixth International Conference on Spoken Language Processing
|
![]() |
This paper presents a method to improve the naturalness of Thai Text-to-speech synthesis, in 4 main parts. In the pausing module, its main function is to determine the break location when synthesizing a Thai text which has no explicit sentence/phrase/word boundary. In the syllable duration and tone generation, a set of rules is provided to generate proper prosodic parameters for synthesizing more natural speech. The syllable duration rule is applied using the Klatt’s method to handle the task in syllabic frame. The tonal rule considers the effect of tonal coarticulation and F0 downdrift in generating the F0 contour parameter. In the demisyllable concatenation, the TD-PSOLA technique is applied to modify the waveform for obtaining the required prosody. The LSP-based concatenated boundary smoothing is also included to imitate the crosssyllable coarticulation effect. The result of comparative quality test shows a significant improvement in our proposed method.
Bibliographic reference. Mittrapiyanuruk, Pradit / Hansakunbuntheung, Chatchawarn / Tesprasit, Virongrong / Sornlertlamvanich, Virach (2000): "Improving naturalness of Thai text-to-speech synthesis by prosodic rule", In ICSLP-2000, vol.3, 334-337.