ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Rich context modeling for high quality HMM-based TTS

Zhi-Jie Yan, Yao Qian, Frank K. Soong

This paper presents a rich context modeling approach to high quality HMM-based speech synthesis. We first analyze the oversmoothing problem in conventional decision tree tying-based HMM, and then propose to model the training speech tokens with rich context models. Special training procedure is adopted for reliable estimation of the rich context model parameters. In synthesis, a search algorithm following a context-based pre-selection is performed to determine the optimal rich context model sequence which generates natural and crisp output speech. Experimental results show that spectral envelopes synthesized by the rich context models are with crisper formant structures and evolve with richer details than those obtained by the conventional models. The speech quality improvement is also perceived by listeners in a subjective preference test, in which 76% of the sentences synthesized using rich context modeling are preferred.

doi: 10.21437/Interspeech.2009-142

Cite as: Yan, Z.-J., Qian, Y., Soong, F.K. (2009) Rich context modeling for high quality HMM-based TTS. Proc. Interspeech 2009, 1755-1758, doi: 10.21437/Interspeech.2009-142

  author={Zhi-Jie Yan and Yao Qian and Frank K. Soong},
  title={{Rich context modeling for high quality HMM-based TTS}},
  booktitle={Proc. Interspeech 2009},