13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis

Xiang Yin, Zhen-Hua Ling, Ming Lei, Lirong Dai

iFLYTEK Speech Lab, University of Science and Technology of China

This paper utilizes global variance (GV) of the log power spectrum (LPS) derived from mel-cepstum to improve hidden Markov model (HMM) based parametric speech synthesis. In order to alleviate the over-smoothing effect on the generated spectral structures, an LPS-GV modeling method using line spectral pairs (LSPs) has been proposed in our previous work, where the estimated distribution of LPS-GV was combined with the trained acoustic models to determine the optimal spectral features at synthesis time. In this paper, we extend this method to the condition where mel-cepstral coefficients are used as spectral features. Further, a method of integrating LPS-GV distortions into the criterion of minimum generation error (MGE) model training is proposed in order to avoid high computational complexity of the parameter generation algorithm considering GV model. Experimental results show that the parameter generation algorithm using LPS-GV model produces more natural acoustic features than the conventional GV modeling method when mel-cepstrum features are used. Besides, integrating LPS-GV distortions into model training criterion achieves similar performance as applying LPS-GV model at synthesis time.

Index Terms: Speech synthesis, hidden Markov model, global variance, log power spectrum

Full Paper

Bibliographic reference.  Yin, Xiang / Ling, Zhen-Hua / Lei, Ming / Dai, Lirong (2012): "Considering global variance of the log power spectrum derived from mel-cepstrum in HMM-based parametric speech synthesis", In INTERSPEECH-2012, 1147-1150.