DBN-ivector Framework for Acoustic Emotion Recognition

Rui Xia, Yang Liu


Deep learning and i-vectors have been successfully used in speech and speaker recognition recently. In this work we propose a framework based on deep belief network (DBN) and i-vector space modeling for acoustic emotion recognition. We use two types of labels for frame level DBN training. The first one is the vector of posterior probabilities calculated from the GMM universal background model (UBM). The second one is the predicted label based on the GMMs. The DBN is trained to minimize errors for both types. After DBN training, we use the vector of posterior probabilities estimated by DBN to replace the UBM for i-vector extraction. Finally the extracted i-vectors are used in backend classifiers for emotion recognition. Our experiments on the USC IEMOCAP data show the effectiveness of our proposed DBN-ivector framework. In particular, with decision level combination, our proposed system yields significant improvement on both unweighted and weighted accuracy.


DOI: 10.21437/Interspeech.2016-488

Cite as

Xia, R., Liu, Y. (2016) DBN-ivector Framework for Acoustic Emotion Recognition. Proc. Interspeech 2016, 480-484.

Bibtex
@inproceedings{Xia+2016,
author={Rui Xia and Yang Liu},
title={DBN-ivector Framework for Acoustic Emotion Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-488},
url={http://dx.doi.org/10.21437/Interspeech.2016-488},
pages={480--484}
}