Integration of intonation in trainable speech synthesis

Lidong Luo, Xingchi Xian

Current developments in artificial speech synthesis place more emphasis on spectral continuities and diverse prosodic effects. The trainable HMM-based speech synthesis method has generated more continuous spectral structure than unit selection method in recent study, but the pitch contour generated by HMM-based method trends to be over-smoothed and lacks syllable variance in Chinese. In this paper, to synthesize speaker dependent speech with specific prosodic style, we model the global intonation in Chinese on the syllable scale with definition of pitch level and use pitch level prediction by statistical method to improve the prosodic effects of speech generated by the HMM-based synthesis method.

