8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Fusion of Global Statistical and Segmental Spectral Features for Speech Emotion Recognition

Hao Hu, Ming-Xing Xu, Wei Wu

Tsinghua University, China

Speech emotion recognition is an interesting and challenging speech technology, which can be applied to broad areas. In this paper, we propose to fuse the global statistical and segmental spectral features at the decision level for speech emotion recognition. Each emotional utterance is individually scored by two recognition systems, the global statistics-based and segmental spectrum-based systems, and a weighted linear combination is applied to fuse their scores for final decision. Experimental results on an emotional speech database demonstrate that the global statistical and segmental spectral features are complementary, and the proposed fusion approach further improves the performance of the emotion recognition system.

Full Paper

Bibliographic reference.  Hu, Hao / Xu, Ming-Xing / Wu, Wei (2007): "Fusion of global statistical and segmental spectral features for speech emotion recognition", In INTERSPEECH-2007, 2269-2272.