Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes

Srinivas Parthasarathy, Carlos Busso


Recognizing emotions using few attribute dimensions such as arousal, valence and dominance provides the flexibility to effectively represent complex range of emotional behaviors. Conventional methods to learn these emotional descriptors primarily focus on separate models to recognize each of these attributes. Recent work has shown that learning these attributes together regularizes the models, leading to better feature representations. This study explores new forms of regularization by adding unsupervised auxiliary tasks to reconstruct hidden layer representations. This auxiliary task requires the denoising of hidden representations at every layer of an auto-encoder. The framework relies on ladder networks that utilize skip connections between encoder and decoder layers to learn powerful representations of emotional dimensions. The results show that ladder networks improve the performance of the system compared to baselines that individually learn each attribute and conventional denoising autoencoders. Furthermore, the unsupervised auxiliary tasks have promising potential to be used in a semi-supervised setting, where few labeled sentences are available.


 DOI: 10.21437/Interspeech.2018-1391

Cite as: Parthasarathy, S., Busso, C. (2018) Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes. Proc. Interspeech 2018, 3698-3702, DOI: 10.21437/Interspeech.2018-1391.


@inproceedings{Parthasarathy2018,
  author={Srinivas Parthasarathy and Carlos Busso},
  title={Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3698--3702},
  doi={10.21437/Interspeech.2018-1391},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1391}
}