Speech Prosody 2010
Chicago, IL, USA
In this paper, we describe experiments on automatic Emotion Recognition using comparable speech corpora collected from real-life American English and German Interactive Voice Response systems. We compute the optimal set of acoustic and prosodic features for mono-, cross- and multi-lingual anger recognition, and analyze the differences. When an emotion recognition system is confronted with a language it has not been trained on we normally observe a severe system degradation. Analyzing this loss we report on strategies to combine the feature spaces with and without combining and retraining the mono-lingual systems. We report classification scores and feature sets for various cases, and estimate the relative importance of features on both databases. We compare the feature distribution and feature ranks by evaluating information gain ratio. After final system integration, we obtain a single bi-lingual anger recognition system which performs just as well as two separate mono-lingual systems on the test data.
Index Terms: emotion recognition, anger classification, IVR speech, IGR, acoustic prosodic features, speech processing
Bibliographic reference. Polzehl, Tim / Schmitt, Alexander / Metze, Florian (2010): "Approaching multi-lingual emotion recognition from speech - on language dependency of acoustic/prosodic features for anger recognition", In SP-2010, paper 442.