5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Duration Modeling For HMM-Based Speech Synthesis

Takayoshi Yoshimura (1), Keiichi Tokuda (1), Takashi Masuko (2), Takao Kobayashi (2), Tadashi Kitamura (1)

(1) Nagoya Institute of Technology, Japan
(2) Tokyo Institute of Technology, Japan

This paper proposes a new approach to state duration modeling for HMM-based speech synthesis. A set of state durations of each phoneme HMM is modeled by a multi-dimensional Gaussian distribution, and duration models are clustered using a decision tree based context clustering technique. In the synthesis stage, state durations are determined by using the state duration models. In this paper, we take account of contextual factors such as stress-related factors and locational factors in addition to phone identity factors. Experimental results show that we can synthesize good quality speech with natural timing, and the speaking rate can be varied easily.

Full Paper

Bibliographic reference.  Yoshimura, Takayoshi / Tokuda, Keiichi / Masuko, Takashi / Kobayashi, Takao / Kitamura, Tadashi (1998): "Duration modeling for HMM-based speech synthesis", In ICSLP-1998, paper 0939.