International Conference on Auditory-Visual Speech Processing 2008
Tangalooma Wild Dolphin Resort,
Moreton Island, Queensland, Australia
The human computer interaction will be more natural if computers are able to perceive and respond to human nonverbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also analyses two approaches used to fuse these two modalities: decision level and feature level integration, and proposes a new multilevel fusion approach for enhancing the person dependant and person independent classification performance for different emotions. Two different audiovisual emotion data corpora was used for the evaluating the proposed fusion approach - DaFEx [1,2] and ENTERFACE  comprising audiovisual emotion data from several actors eliciting five different emotions - anger, disgust, fear, happiness, sadness and surprise. The results of the experimental study reveal that the system based on fusion of facial expression with acoustic information yields better performance than the system based on just acoustic information or facial expressions, for the emotions considered. Results also show an improvement in classification performance of different emotions with a multilevel fusion approach as compared to either feature level or score-level fusion.
Bibliographic reference. Chetty, Girija / Wagner, Michael (2008): "A multilevel fusion approach for audiovisual emotion recognition", In AVSP-2008, 115-120.