In this paper, we propose to use the denoising autoencoder to generate robust feature representations for emotion recognition. In our method, the input of the denoising autoencoder is the normalized static feature set (state-of-the-art features for emotion recognition). This input is mapped to two hidden representations: one is to capture the neutral information from the input, and the other one is used to extract emotional information. Model parameters are learned by minimizing the squared error between the original and the reconstructed input. After pre-training and fine-tuning, we use the hidden representation as features in the SVM model for emotion classification. Our experimental results show significant performance improvement compared to using the static features.
Bibliographic reference. Xia, Rui / Liu, Yang (2013): "Using denoising autoencoder for emotion recognition", In INTERSPEECH-2013, 2886-2889.