ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

Cross-Corpus Speech Emotion Recognition with HuBERT Self-Supervised Representation

Miguel Pastor, Dayana Ribas, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

Speech Emotion Recognition (SER) is a task related to many applications in the framework of human-machine interaction. However, the lack of suitable speech emotional datasets compromises the performance of the SER systems. A lot of labeled data are required to accomplish successful training, especially for current Deep Neural Network (DNN)-based solutions. Previous works have explored different strategies for extending the training set using some emotion speech corpora available. In this paper, we evaluate the impact on the performance of crosscorpus as a data augmentation strategy for spectral representations and the recent Self-Supervised (SS) representation of Hu- BERT in an SER system. Experimental results show improvements in the accuracy of SER in the IEMOCAP dataset when extending the training set with two other datasets, EmoDB in German and RAVDESS in English.


doi: 10.21437/IberSPEECH.2022-16

Cite as: Pastor, M., Ribas, D., Ortega, A., Miguel, A., Lleida, E. (2022) Cross-Corpus Speech Emotion Recognition with HuBERT Self-Supervised Representation . Proc. IberSPEECH 2022, 76-80, doi: 10.21437/IberSPEECH.2022-16

@inproceedings{pastor22_iberspeech,
  author={Miguel Pastor and Dayana Ribas and Alfonso Ortega and Antonio Miguel and Eduardo Lleida},
  title={{Cross-Corpus Speech Emotion Recognition with HuBERT Self-Supervised Representation }},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={76--80},
  doi={10.21437/IberSPEECH.2022-16}
}