ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Emotion recognition using linear transformations in combination with video

Rok Gajšek, Vitomir Štruc, Simon Dobrišek, France Mihelič

The paper discuses the usage of linear transformations of Hidden Markov Models, normally employed for speaker and environment adaptation, as a way of extracting the emotional components from the speech. A constrained version of Maximum Likelihood Linear Regression (CMLLR) transformation is used as a feature for classification of normal or aroused emotional state. We present a procedure of incrementally building a set of speaker independent acoustic models, that are used to estimate the CMLLR transformations for emotion classification. An audio-video database of spontaneous emotions (AvID) is briefly presented since it forms the basis for the evaluation of the proposed method. Emotion classification using the video part of the database is also described and the added value of combining the visual information with the audio features is shown.


doi: 10.21437/Interspeech.2009-476

Cite as: Gajšek, R., Štruc, V., Dobrišek, S., Mihelič, F. (2009) Emotion recognition using linear transformations in combination with video. Proc. Interspeech 2009, 1967-1970, doi: 10.21437/Interspeech.2009-476

@inproceedings{gajsek09_interspeech,
  author={Rok Gajšek and Vitomir Štruc and Simon Dobrišek and France Mihelič},
  title={{Emotion recognition using linear transformations in combination with video}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1967--1970},
  doi={10.21437/Interspeech.2009-476}
}