ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network

Duc Le, Zakaria Aldeneh, Emily Mower Provost

Estimating continuous emotional states from speech as a function of time has traditionally been framed as a regression problem. In this paper, we present a novel approach that moves the problem into the classification domain by discretizing the training labels at different resolutions. We employ a multi-task deep bidirectional long-short term memory (BLSTM) recurrent neural network (RNN) trained with cost-sensitive Cross Entropy loss to model these labels jointly. We introduce an emotion decoding algorithm that incorporates long- and short-term temporal properties of the signal to produce more robust time series estimates. We show that our proposed approach achieves competitive audio-only performance on the RECOLA dataset, relative to previously published works as well as other strong regression baselines. This work provides a link between regression and classification, and contributes an alternative approach for continuous emotion recognition.


doi: 10.21437/Interspeech.2017-94

Cite as: Le, D., Aldeneh, Z., Provost, E.M. (2017) Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network. Proc. Interspeech 2017, 1108-1112, doi: 10.21437/Interspeech.2017-94

@inproceedings{le17b_interspeech,
  author={Duc Le and Zakaria Aldeneh and Emily Mower Provost},
  title={{Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1108--1112},
  doi={10.21437/Interspeech.2017-94}
}