An appealing representation of emotions is the use of emotional attributes such as arousal (passive versus active), valence (negative versus positive) and dominance (weak versus strong). While previous studies have considered these dimensions as orthogonal descriptors to represent emotions, there are strong theoretical and practical evidences showing the interrelation between these emotional attributes. This observation suggests that predicting emotional attributes with a unified framework should outperform machine learning algorithms that separately predict each attribute. This study presents methods to jointly learn emotional attributes by exploiting their interdependencies. The framework relies on multi-task learning (MTL) implemented with deep neural networks (DNN) with shared hidden layers. The framework provides a principled approach to learn shared feature representations that maximize the performance of regression models. The results of within-corpus and cross-corpora evaluation show the benefits of MTL over single task learning (STL). MTL achieves gains on concordance correlation coefficient (CCC) as high as 4.7% for within-corpus evaluations, and 14.0% for cross-corpora evaluations. The visualization of the activations of the last hidden layers illustrates that MTL creates better feature representation. The best structure has shared layers followed by attribute-dependent layers, capturing better the relation between attributes.
Cite as: Parthasarathy, S., Busso, C. (2017) Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning. Proc. Interspeech 2017, 1103-1107, doi: 10.21437/Interspeech.2017-1494
@inproceedings{parthasarathy17_interspeech, author={Srinivas Parthasarathy and Carlos Busso}, title={{Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1103--1107}, doi={10.21437/Interspeech.2017-1494} }