ISCA Archive SMM 2020
ISCA Archive SMM 2020

Multimodal emotion recognition: Understanding the production process before modeling multimodal behaviors

Carlos Busso

The verbal and non-verbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message being communicated. The interplay is observed across the different communication channels such as various aspects of speech, facial expressions, and movements of the hands, head and body. For example, facial expressions and prosodic speech tend to have a stronger emotional modulation when the vocal tract is physically constrained by the articulation to convey other linguistic communicative goals. This interplay also affects gestures and speech events observed across individuals. Psycholinguistic studies on human communication have shown that during human interaction individuals tend to adapt their behaviors mimicking the spoken style, gestures and expressions of their conversational partners. This synchronization pattern is referred to as entrainment. We study the presence of entrainment at the emotion level in cross-modality settings and its implications on multimodal emotion recognition systems. The analysis explores the relationship between acoustic features of the speaker and facial expressions of the interlocutor during dyadic interactions. The analysis shows a strong mutual influence in their expressive behaviors. The seminar will discuss the clear implications of these results for audiovisual emotion recognition, and other areas of affective computing.


Cite as: Busso, C. (2020) Multimodal emotion recognition: Understanding the production process before modeling multimodal behaviors. Proc. Workshop on Speech, Music and Mind (SMM 2020),

@inproceedings{busso20_smm,
  author={Carlos Busso},
  title={{Multimodal emotion recognition: Understanding the production process before modeling multimodal behaviors}},
  year=2020,
  booktitle={Proc. Workshop on Speech, Music and Mind (SMM 2020)},
  pages={}
}