Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis

Takayoshi Yoshimura (1), Keiichi Tokuda (1), Takashi Masuko (2), Takao Kobayashi (2), Tadashi Kitamura (1)

(1) Nagoya Institute of Technology, Gokiso, Shouwa-ku, Nagoya, Japan
(2) Tokyo Institute of Technology, Nagatsuta, Midori-ku, Yokohama, Japan

In this paper, we describe an HMM­based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of H­MM. In the system, pitch and state duration are modeled by multispace probability distribution HMMs and multidimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision­tree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMMand a melcepstrum based vocod­ing technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes natural­sounding speech which resembles the speaker the training database.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Yoshimura, Takayoshi / Tokuda, Keiichi / Masuko, Takashi / Kobayashi, Takao / Kitamura, Tadashi (1999): "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis", In EUROSPEECH'99, 2347-2350.