5th International Conference on Spoken Language Processing
This paper proposes a new approach to state duration modeling for HMM-based speech synthesis. A set of state durations of each phoneme HMM is modeled by a multi-dimensional Gaussian distribution, and duration models are clustered using a decision tree based context clustering technique. In the synthesis stage, state durations are determined by using the state duration models. In this paper, we take account of contextual factors such as stress-related factors and locational factors in addition to phone identity factors. Experimental results show that we can synthesize good quality speech with natural timing, and the speaking rate can be varied easily.
Bibliographic reference. Yoshimura, Takayoshi / Tokuda, Keiichi / Masuko, Takashi / Kobayashi, Takao / Kitamura, Tadashi (1998): "Duration modeling for HMM-based speech synthesis", In ICSLP-1998, paper 0939.