This paper presents a study on the importance of short-term spectral and excitation parameterizations for emotional hidden Markov model (HMM)-based speech synthesis. The analysis is performed through an emotion classification task by using two methods: K-means emotion clustering and Gaussian Mixture Models (GMMs)-based emotion identification. Two known forms of parameterization for the short-term speech spectral envelope, the mel-cepstrum and the mel- line spectrum pairs are utilized while features derived from the complex cepstrum and group delay, and band-aperiodicity coefficients are used as excitation parameters. The emotion-dependent features according to the classification performance are then selected to train emotion-dependent HMM-based synthesizers. Listening tests are then performed to verify the impact of the parameters on the similarity of the synthesized speech with its natural version.
Index Terms: speech synthesis, statistical parametric speech synthesis, expressive speech synthesis
Bibliographic reference. Maia, Ranniery / Akamine, Masami (2012): "Analysis on the importance of short-term speech parameterizations for emotional statistical parametric speech synthesis", In INTERSPEECH-2012, 1632-1635.