INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Emotion Recognition Using Linear Transformations in Combination with Video

Rok Gajšek, Vitomir Štruc, Simon Dobrišek, France Mihelič

University of Ljubljana, Slovenia

The paper discuses the usage of linear transformations of Hidden Markov Models, normally employed for speaker and environment adaptation, as a way of extracting the emotional components from the speech. A constrained version of Maximum Likelihood Linear Regression (CMLLR) transformation is used as a feature for classification of normal or aroused emotional state. We present a procedure of incrementally building a set of speaker independent acoustic models, that are used to estimate the CMLLR transformations for emotion classification. An audio-video database of spontaneous emotions (AvID) is briefly presented since it forms the basis for the evaluation of the proposed method. Emotion classification using the video part of the database is also described and the added value of combining the visual information with the audio features is shown.

Full Paper

Bibliographic reference.  Gajšek, Rok / Štruc, Vitomir / Dobrišek, Simon / Mihelič, France (2009): "Emotion recognition using linear transformations in combination with video", In INTERSPEECH-2009, 1967-1970.