To understand the factors that influence auditory and visual emotion recognition performance we examined a perception set of stimuli produced by three talkers that differed in how well people could recognise their emotions. Our proposal was that productions based on a model of prototypical emotion attributes will be more consistent and better recognized. To test this, we trained a classification model on a parallel holdout set of stimuli by the same talkers and examined the consistency of the emotion portrayals. We found that the emotion stimuli from a talker who produced more consistent emotion portrayals were better recognized than those stimuli that were less consistently produced.
Cite as: Davis, C., Kim, J. (2019) Auditory and Visual Emotion Recognition: Investigating why some portrayals are better recognized than others. Proc. The 15th International Conference on Auditory-Visual Speech Processing, 33-37, doi: 10.21437/AVSP.2019-7
@inproceedings{davis19_avsp, author={Chris Davis and Jeesun Kim}, title={{Auditory and Visual Emotion Recognition: Investigating why some portrayals are better recognized than others}}, year=2019, booktitle={Proc. The 15th International Conference on Auditory-Visual Speech Processing}, pages={33--37}, doi={10.21437/AVSP.2019-7} }