INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

On the Limitations of Voice Conversion Techniques in Emotion Identification Tasks

R. Barra, J. M. Montero, J. Macias-Guarasa, J. Gutiérrez-Arriola, J. Ferreiros, J. M. Pardo

Universidad Politécnica de Madrid, Spain

The growing interest in emotional speech synthesis urges effective emotion conversion techniques to be explored. This paper estimates the relevance of three speech components (spectral envelope, residual excitation and prosody) for synthesizing identifiable emotional speech, in order to be able to customize voice conversion techniques to the specific characteristics of each emotion. The analysis has been based on a listening test with a set of synthetic mixed-emotion utterances that draw their speech components from emotional and neutral recordings. Results prove the importance of transforming residual excitation for the identification of emotions that are not fully conveyed through prosodic means (such as cold anger or sadness in our Spanish corpus).

Full Paper

Bibliographic reference.  Barra, R. / Montero, J. M. / Macias-Guarasa, J. / Gutiérrez-Arriola, J. / Ferreiros, J. / Pardo, J. M. (2007): "On the limitations of voice conversion techniques in emotion identification tasks", In INTERSPEECH-2007, 2233-2236.