11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Improved Generation of Fundamental Frequency in HMM-Based Speech Synthesis Using Generation Process Model

Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu

University of Tokyo, Japan

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (VU) decisions are the two key factors in voice quality problems. In this paper, an F0 generation process model is used to re-estimate F0 values in the regions of pitch tracking errors, as well as in unvoiced regions. A prior knowledge of VU is imposed in each Mandarin phoneme and they are used for VU decision. Then the F0 can be modeled within the standard HMM framework.

Full Paper

Bibliographic reference.  Wang, Miaomiao / Wen, Miaomiao / Hirose, Keikichi / Minematsu, Nobuaki (2010): "Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model", In INTERSPEECH-2010, 2166-2169.