ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

Speech emotion recognition in Spanish TV Debates

Irune Zubiaga, Raquel Justo, M. Inés Torres, Mikel De Velasco

Emotion recognition from speech is an active field of study that can help build more natural human–machine interaction systems. Even though the advancement of deep learning technology has brought improvements in this task, it is still a very challenging field. For instance, when considering real life scenarios, things such as tendency toward neutrality or the ambiguous definition of emotion can make labeling a difficult task causing the data-set to be severally imbalanced and not very representative. In this work we considered a real life scenario to carry out a series of emotion classification experiments. Specifically, we worked with a labeled corpus consisting of a set of audios from Spanish TV debates and their respective transcriptions. First, an analysis of the emotional information within the corpus was conducted. Then different data representations were analyzed as to choose the best one for our task; Spectrograms and UniSpeech-SAT were used for audio representation and DistilBERT for text representation. As a final step, Multimodal Machine Learning was used with the aim of improving the obtained classification results by combining acoustic and textual information.


doi: 10.21437/IberSPEECH.2022-38

Cite as: Zubiaga, I., Justo, R., Torres, M.I., Velasco, M.D. (2022) Speech emotion recognition in Spanish TV Debates . Proc. IberSPEECH 2022, 186-190, doi: 10.21437/IberSPEECH.2022-38

@inproceedings{zubiaga22_iberspeech,
  author={Irune Zubiaga and Raquel Justo and M. Inés Torres and Mikel De Velasco},
  title={{Speech emotion recognition in Spanish TV Debates }},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={186--190},
  doi={10.21437/IberSPEECH.2022-38}
}