Modeling Pronunciation Variation for Automatic Speech Recognition

Rolduc, The Netherlands
May 4-6, 1998

Pronunciation Variations in Emotional Speech

Thomas S. Polzin, Alexander Waibel

Interactive Systems Laboratories, Carnegie Mellon University, Pittsburgh, PA, USA

In this paper we demonstrate how the emotional state of the speaker influences his or her speech. We show that recognition accuracy varies significantly depending on the emotional state of the speaker. Our system models the pronunciation variation of emotional speech both at the acoustic and prosodic level. We show that using emotion-specific acoustic and prosodic models allows the system to discriminate among four emotions (happy sad, angry, and afraid) well above chance level. Finally, we show that emotion-specific modeling improves the word accuracy of the speech recognition system when faced with emotional speech.

Full Paper

Bibliographic reference.  Polzin, Thomas S. / Waibel, Alexander (1998): "Pronunciation variations in emotional speech", In MPV-1998, 103-108.