Emotion Recognition from Human Speech Using Temporal Information and Deep Learning

John Kim, Rif A. Saurous


Emotion recognition by machine is a challenging task, but it has great potential to make empathic human-machine communications possible. In conventional approaches that consist of feature extraction and classifier stages, extensive studies have devoted their effort to developing good feature representations, but relatively little effort was made to make proper use of the important temporal information in these features. In this paper, we propose a model combining features known to be useful for emotion recognition and deep neural networks to exploit temporal information when recognizing emotion status. A benchmark evaluation on EMO-DB demonstrates that the proposed model achieves a state-of-the-art performance of 88.9% recognition rate.


 DOI: 10.21437/Interspeech.2018-1132

Cite as: Kim, J., Saurous, R.A. (2018) Emotion Recognition from Human Speech Using Temporal Information and Deep Learning. Proc. Interspeech 2018, 937-940, DOI: 10.21437/Interspeech.2018-1132.


@inproceedings{Kim2018,
  author={John Kim and Rif A. Saurous},
  title={Emotion Recognition from Human Speech Using Temporal Information and Deep Learning},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={937--940},
  doi={10.21437/Interspeech.2018-1132},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1132}
}