13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis

Ranniery Maia (1), Masami Akamine (2)

(1) Cambridge Research Laboratory, Toshiba Research Europe Ltd, Cambridge, UK
(2) Corporate Research and Development Center, Toshiba Corporation, Kawasaki, Japan

This paper presents a study on the importance of short-term spectral and excitation parameterizations for emotional hidden Markov model (HMM)-based speech synthesis. The analysis is performed through an emotion classification task by using two methods: K-means emotion clustering and Gaussian Mixture Models (GMMs)-based emotion identification. Two known forms of parameterization for the short-term speech spectral envelope, the mel-cepstrum and the mel- line spectrum pairs are utilized while features derived from the complex cepstrum and group delay, and band-aperiodicity coefficients are used as excitation parameters. The emotion-dependent features according to the classification performance are then selected to train emotion-dependent HMM-based synthesizers. Listening tests are then performed to verify the impact of the parameters on the similarity of the synthesized speech with its natural version.

Index Terms: speech synthesis, statistical parametric speech synthesis, expressive speech synthesis

Full Paper

Bibliographic reference.  Maia, Ranniery / Akamine, Masami (2012): "Analysis on the importance of short-term speech parameterizations for emotional statistical parametric speech synthesis", In INTERSPEECH-2012, 1632-1635.