Sixth European Conference on Speech Communication and Technology
In this paper, we describe an HMMbased speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multispace probability distribution HMMs and multidimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decisiontree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMMand a melcepstrum based vocoding technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes naturalsounding speech which resembles the speaker the training database.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Yoshimura, Takayoshi / Tokuda, Keiichi / Masuko, Takashi / Kobayashi, Takao / Kitamura, Tadashi (1999): "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis", In EUROSPEECH'99, 2347-2350.