ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A minimum v/u error approach to F0 generation in HMM-based TTS

Yao Qian, Frank K. Soong, Miaomiao Wang, Zhizheng Wu

The HMM-based TTS can produce a highly intelligible and decent quality voice. However, HMM model degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (v/u) decisions are identified as two key factors in voice quality problems. In this paper, we propose a minimum v/u error approach to F0 generation. A prior knowledge of v/u is imposed in each Mandarin phone and accumulated v/u posterior probabilities are used to search for the optimal v/u switching point in each VU or UV segment in generation. Objectively the new approach is shown to improve v/u prediction performance, specifically on voiced to unvoiced swapping errors. They are reduced from 3.7% (baseline) down to 2.0% (new approach). The improvement is also subjectively confirmed by an AB preference test score, 72% (new approach) versus 22% (baseline).


doi: 10.21437/Interspeech.2009-137

Cite as: Qian, Y., Soong, F.K., Wang, M., Wu, Z. (2009) A minimum v/u error approach to F0 generation in HMM-based TTS. Proc. Interspeech 2009, 408-411, doi: 10.21437/Interspeech.2009-137

@inproceedings{qian09_interspeech,
  author={Yao Qian and Frank K. Soong and Miaomiao Wang and Zhizheng Wu},
  title={{A minimum v/u error approach to F0 generation in HMM-based TTS}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={408--411},
  doi={10.21437/Interspeech.2009-137}
}