Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions

Syeda Narjis Fatima, Engin Erzin


Dyadic interactions encapsulate rich emotional exchange between interlocutors suggesting a multimodal, cross-speaker and cross-dimensional continuous emotion dependency. This study explores the dynamic inter-attribute emotional dependency at the cross-subject level with implications to continuous emotion recognition based on speech and body motion cues. We propose a novel two-stage Gaussian Mixture Model mapping framework for the continuous emotion recognition problem. In the first stage, we perform continuous emotion recognition (CER) of both speakers from speech and body motion modalities to estimate activation, valence and dominance (AVD) attributes. In the second stage, we improve the first stage estimates by performing CER of the selected speaker using her/his speech and body motion modalities as well as using the estimated affective attribute(s) of the other speaker. Our experimental evaluations indicate that the second stage, cross-subject continuous emotion recognition (CSCER), provides complementary information to recognize the affective state, and delivers promising improvements for the continuous emotion recognition problem.


 DOI: 10.21437/Interspeech.2017-1413

Cite as: Fatima, S.N., Erzin, E. (2017) Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions. Proc. Interspeech 2017, 1731-1735, DOI: 10.21437/Interspeech.2017-1413.


@inproceedings{Fatima2017,
  author={Syeda Narjis Fatima and Engin Erzin},
  title={Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1731--1735},
  doi={10.21437/Interspeech.2017-1413},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1413}
}