Auditory-Visual Speech Processing (AVSP) 2010

Hakone, Kanagawa, Japan
September 30-October 3, 2010

Emotion Perception by Eye and Ear and Halves and Wholes

Jeesun Kim, Chris Davis

MARCS Auditory Laboratories, University of Western Sydney, Australia

How is the perception of emotion affected by the provision of multiple sources of information (both within and across modality)? We examined how the perception of emotion differed depending upon which face regions were visible and which modality (auditory or visual, AV) was used. Auditory and visual speech of five talkers expressing anger, disgust, fear, happy, sad, surprise or neutral emotion were presented in face-only, voice-only and face-voice presentation conditions. The visual speech stimuli presented the upper, lower and whole face. The participant’s task was to judge which emotion was expressed. The results showed that the upper and lower parts of the talker’s face were not equally informative across emotion types. Also the face and voice conveyed different degrees and types of emotion information. Response confusion matrices showed that, depending on the type of emotion, the whole face pattern resembled either the upper or lower face one. For the AV face-voice stimuli, the response pattern changed depending on the relative informativeness of the unimodal signals. Based on the results, we suggest a model for how emotion information from different sources is combined to drive perception.

Index Terms: Emotion perception, Auditory-Visual perception, Emotion recognition, Visual speech, Face and voice.

Full Paper

Bibliographic reference.  Kim, Jeesun / Davis, Chris (2010): "Emotion perception by eye and ear and halves and wholes", In AVSP-2010, paper S3-2.