10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis

Cheng-Cheng Wang, Zhen-Hua Ling, Li-Rong Dai

University of Science & Technology of China, China

This paper proposes an asynchronous model structure for fundamental frequency(F0) and spectrum modeling in HMM-based parametric speech synthesis to improve the performance of F0 prediction. F0 and spectrum features are considered to be synchronous in the conventional system. Considering that the production of these two features is decided by the movement of different speech organs, an explicitly asynchronous model structure is introduced. At training stage, F0 models are training asynchronously with spectrum models. At synthesis stage, the two features are generated respectively. The objective and subjective evaluation results show the proposed method can effectively improve the accuracy of F0 prediction.

