This paper presents a method to model the global variance (GV) of log power spectrums derived from the line spectral pairs (LSPs) in a sentence for HMM-based parametric speech synthesis. Different from the conventional GV method where the observations for GV model training are the variances of spectral parameters for each training sentence, our proposed method directly models the temporal variances of each frequency point in the spectral envelope reconstructed using LSPs. At synthesis stage, the likelihood function of trained GV model is integrated into the maximum likelihood parameter generation algorithm to alleviate the over-smoothing effect on the generated spectral structures. Experiment results show that the proposed method can outperform the conventional GV method when LSPs are used as the spectral parameters and improve the naturalness of synthetic speech significantly.
Bibliographic reference. Ling, Zhen-Hua / Hu, Yu / Dai, Lirong (2010): "Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis", In INTERSPEECH-2010, 825-828.