This paper proposes an asynchronous model structure for fundamental frequency(F0) and spectrum modeling in HMM-based parametric speech synthesis to improve the performance of F0 prediction. F0 and spectrum features are considered to be synchronous in the conventional system. Considering that the production of these two features is decided by the movement of different speech organs, an explicitly asynchronous model structure is introduced. At training stage, F0 models are training asynchronously with spectrum models. At synthesis stage, the two features are generated respectively. The objective and subjective evaluation results show the proposed method can effectively improve the accuracy of F0 prediction.
Bibliographic reference. Wang, Cheng-Cheng / Ling, Zhen-Hua / Dai, Li-Rong (2009): "Asynchronous F0 and spectrum modeling for HMM-based speech synthesis", In INTERSPEECH-2009, 404-407.