International Conference on Auditory-Visual Speech Processing 2008

Tangalooma Wild Dolphin Resort, Moreton Island, Queensland, Australia
September 26-29, 2008

A Multilevel Fusion Approach for Audiovisual Emotion Recognition

Girija Chetty, Michael Wagner

National Centre for Biometric Studies, Faculty of Information Sciences and Engineering, University of Canberra, Australia

The human computer interaction will be more natural if computers are able to perceive and respond to human nonverbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also analyses two approaches used to fuse these two modalities: decision level and feature level integration, and proposes a new multilevel fusion approach for enhancing the person dependant and person independent classification performance for different emotions. Two different audiovisual emotion data corpora was used for the evaluating the proposed fusion approach - DaFEx [1,2] and ENTERFACE [3] comprising audiovisual emotion data from several actors eliciting five different emotions - anger, disgust, fear, happiness, sadness and surprise. The results of the experimental study reveal that the system based on fusion of facial expression with acoustic information yields better performance than the system based on just acoustic information or facial expressions, for the emotions considered. Results also show an improvement in classification performance of different emotions with a multilevel fusion approach as compared to either feature level or score-level fusion.


  1. Battocchi, A.; Pianesi, F.. 2004. DaFEx: Un Database di Espressioni Facciali Dinamiche. In Proceedings of the SLI-GSCP Workshop "Comunicazione Parlata e Manifestazione delle Emozioni", Padova (Italy) 30 Novembre - 1 Dicembre 2004.
  2. Mana N., Cosi P., Tisato G., Cavicchio F., Magno E. and Pianesi F., An Italian Database of Emotional Speech and Facial Expressions, In Proceedings of "Workshop on Emotion: Corpora for Research on Emotion and Affect", in association with "5th International Conference on Language, Resources and Evaluation (LREC2006), Genoa, Italy, 24-25- 26 May 2006.
  3. Martin O., Adell J., Huerta A., Kotsia I., Savran A., Sebbe R., Multimodal Caricatural Mirror, Proceedings Enterface’05,Workshop, (ISCA Archive,

Bibliographic reference.  Chetty, Girija / Wagner, Michael (2008): "A multilevel fusion approach for audiovisual emotion recognition", In AVSP-2008, 115-120.