Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space

Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai


In this work, an emotion-pair based framework is proposed for speech emotion recognition, which constructs more discriminative feature subspaces for every two different emotions (emotion-pair) to generate more precise emotion bi-classification results. Furthermore, it is found that in the dimensional emotion space, the distances between some of the archetypal emotions are closer than the others. Motivated by this, a Naive Bayes classifier based decision fusion strategy is proposed, which aims at capturing such useful emotion distribution information in deciding the final emotion category for emotion recognition. We evaluated the classification framework on the USC IEMOCAP database. Experimental results demonstrate that the proposed method outperforms the hierarchical binary decision tree approach on both weighted accuracy (WA) and unweighted accuracy (UA). Moreover, our framework possesses the advantages that it can be fully automatically generated without empirical guidance and is easier to be parallelized.


 DOI: 10.21437/Interspeech.2017-619

Cite as: Ma, X., Wu, Z., Jia, J., Xu, M., Meng, H., Cai, L. (2017) Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space. Proc. Interspeech 2017, 1238-1242, DOI: 10.21437/Interspeech.2017-619.


@inproceedings{Ma2017,
  author={Xi Ma and Zhiyong Wu and Jia Jia and Mingxing Xu and Helen Meng and Lianhong Cai},
  title={Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1238--1242},
  doi={10.21437/Interspeech.2017-619},
  url={http://dx.doi.org/10.21437/Interspeech.2017-619}
}