This paper proposes an approach to improving the correctness of tone of the synthesized speech which is generated by an HMM-based Thai speech synthesis system. In the tree-based context clustering process, tone groups and tone types are used to design four different structures of decision tree including a single binary tree structure, a simple tone-separated tree structure, a constancy-based-tone-separated tree structure, and a trend-based-tone-separated tree structure. A subjective evaluation of tone correctness is conducted by using tone perception of eight Thai listeners. The simple tone-separated tree structure gives the highest level of tone correctness, while the single binary tree structure gives the lowest level of tone correctness. Moreover, the additional contextual tone information which is applied to all structures of the decision tree achieves a significant improvement of tone correctness. Finally, the evaluation of syllable duration distortion among the four structures shows that the constancy-based-toneseparated and the trend-based-tone-separated tree structures can alleviate the distortions that appear when using the simple tone-separated tree structure.
Cite as: Chomphan, S., Kobayashi, T. (2007) Design of tree-based context clustering for an HMM-based Thai speech synthesis system. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 160-165
@inproceedings{chomphan07_ssw, author={Suphattharachai Chomphan and Takao Kobayashi}, title={{Design of tree-based context clustering for an HMM-based Thai speech synthesis system}}, year=2007, booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)}, pages={160--165} }