ISCA Archive SPSC 2022
ISCA Archive SPSC 2022

Zero-shot cross-lingual speech emotion recognition: a study of loss functions and feature importance

Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen, Nicklas Leander Lund

Deep learning has led to the rapid advancement of speech emotion recognition (SER) hence enabling its application and deployment in wide ranging applications and sectors. However, conventional challenges like generalizing over unseen corpora and languages, and newer challenges like the lack of interpretability and transparency of deep learning models impact the security of these methods, thereby negatively influencing their usability and acceptability in real-world applications. Here, we address this gap by investigating the influence of the formulation and design of the learning function on the ability to transfer emotion representation learned in one language to other languages. Furthermore, we examine the importance of the different feature groups for the emotion classes, and the associations between the feature groups and the learning functions. From the evaluation, we conclude that the dimensional model of emotion, specifically activation is more transferable than emotion classes over unseen languages than valence. However, this transferability does not necessarily translate to higher classification accuracy.


doi: 10.21437/SPSC.2022-5

Cite as: Das, S., Lønfeldt, N.N., Pagsberg, A.K., Clemmensen, L.H., Lund, N.L. (2022) Zero-shot cross-lingual speech emotion recognition: a study of loss functions and feature importance. Proc. 2nd Symposium on Security and Privacy in Speech Communication, 23-29, doi: 10.21437/SPSC.2022-5

@inproceedings{das22_spsc,
  author={Sneha Das and Nicole Nadine Lønfeldt and Anne Katrine Pagsberg and Line. H. Clemmensen and Nicklas Leander Lund},
  title={{Zero-shot cross-lingual speech emotion recognition: a study of loss functions and feature importance}},
  year=2022,
  booktitle={Proc. 2nd Symposium on Security and Privacy in Speech Communication},
  pages={23--29},
  doi={10.21437/SPSC.2022-5}
}