This paper describes a novel approach to the realization of Thai speech synthesis. Spectrum, pitch, and phone duration are modeled simultaneously in a unified framework of HMM, and their parameter distributions are clustered independently by using a decision-tree based context clustering technique with different styles. A group of contextual factors which affect spectrum, pitch, and state duration, i.e., tone type, part of speech, are taken into account especially for a tonal language. The evaluation of the synthesized speech shows that tone correctness is significantly improved in some clustering styles, moreover the implemented system gives the better reproduction of prosody (or naturalness, in some sense) than the unit-selection-based system with the same speech database.
Bibliographic reference. Chomphan, Suphattharachai / Kobayashi, Takao (2007): "Implementation and evaluation of an HMM-based Thai speech synthesis system", In INTERSPEECH-2007, 2849-2852.