Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features

Ozlem Kalinli


In this paper, we build mono-lingual and cross-lingual emotion recognition systems and report performance on English and German databases. The emotion recognition system uses biologically inspired auditory attention features together with a neural network for learning the mapping between features and emotion classes. We first build mono-lingual systems for both Berlin Database of Emotional Speech (EMO-DB) and LDC’s Emotional Prosody (Emo-Prosody) and achieve 82.7% and 56.7% accuracy for five class emotion classification (neutral, sad, angry, happy, and boredom) using leave-one-speaker-out cross validation. When tested with cross-lingual systems, the five-class emotion recognition accuracy drops to 55.1% and 41.4% accuracy for EMO-DB and Emo-Prosody, respectively. Finally, we build a bilingual emotion recognition system and report experimental results and their analysis. Bilingual system performs close to the performance of individual mono-lingual systems.


DOI: 10.21437/Interspeech.2016-1557

Cite as

Kalinli, O. (2016) Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features. Proc. Interspeech 2016, 3613-3617.

Bibtex
@inproceedings{Kalinli2016,
author={Ozlem Kalinli},
title={Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1557},
url={http://dx.doi.org/10.21437/Interspeech.2016-1557},
pages={3613--3617}
}