Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Approaching Multi-Lingual Emotion Recognition from Speech - On Language Dependency of Acoustic/Prosodic Features for Anger Recognition

Tim Polzehl (1), Alexander Schmitt (2), Florian Metze (3)

(1) Deutsche Telekom Laboratories / Quality and Usability Lab, Technische Universität Berlin
(2) Dialogue Systems Group Institute, Information Technology University of Ulm, Germany
(3) Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

In this paper, we describe experiments on automatic Emotion Recognition using comparable speech corpora collected from real-life American English and German Interactive Voice Response systems. We compute the optimal set of acoustic and prosodic features for mono-, cross- and multi-lingual anger recognition, and analyze the differences. When an emotion recognition system is confronted with a language it has not been trained on we normally observe a severe system degradation. Analyzing this loss we report on strategies to combine the feature spaces with and without combining and retraining the mono-lingual systems. We report classification scores and feature sets for various cases, and estimate the relative importance of features on both databases. We compare the feature distribution and feature ranks by evaluating information gain ratio. After final system integration, we obtain a single bi-lingual anger recognition system which performs just as well as two separate mono-lingual systems on the test data.

Index Terms: emotion recognition, anger classification, IVR speech, IGR, acoustic prosodic features, speech processing

Full Paper

Bibliographic reference.  Polzehl, Tim / Schmitt, Alexander / Metze, Florian (2010): "Approaching multi-lingual emotion recognition from speech - on language dependency of acoustic/prosodic features for anger recognition", In SP-2010, paper 442.