An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals

Dengke Tang, Junlin Zeng, Ming Li


The goal of the ongoing ComParE 2018 Atypical Affect sub-challenge is to recognize the emotional states of atypical individuals. In this work, we present three modeling methods under the end-to-end learning framework, namely CNN combined with extended features, CNN+RNN and ResNet, respectively. Furthermore, we investigate multiple data augmentation, balancing and sampling methods to further enhance the system performance. The experimental results show that data balancing and augmentation increase the unweighted accuracy (UAR) by 10% absolutely. After score level fusion, our proposed system achieves 48.8% UAR on the develop dataset.


 DOI: 10.21437/Interspeech.2018-2581

Cite as: Tang, D., Zeng, J., Li, M. (2018) An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals. Proc. Interspeech 2018, 162-166, DOI: 10.21437/Interspeech.2018-2581.


@inproceedings{Tang2018,
  author={Dengke Tang and Junlin Zeng and Ming Li},
  title={An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={162--166},
  doi={10.21437/Interspeech.2018-2581},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2581}
}