Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition

Zhixuan Li, Liang He, Jingyang Li, Li Wang, Wei-Qiang Zhang


Speech emotion recognition (SER) is a challenging task: the complex emotional expressions make it difficult to discriminate different emotions; the unbalanced data misleads models to give biased predictions. In this work, we tackle these two problems by the angular softmax loss. First, we replace the vanilla softmax with angular softmax to learn emotional representations with strong discriminant power. Besides, inspired by its novel geometric interpretation, we establish a general calculation model and deduce a concise formula of decision domain. Based on these derivations, we propose our solution to data imbalance: class-specific angular softmax by which we can directly adjust decision domains of different emotion classes. Experimental results on the IEMOCAP corpus indicate significant improvements on two state-of-the-art models therefore demonstrate the effectiveness of our proposed methods.


 DOI: 10.21437/Interspeech.2019-1683

Cite as: Li, Z., He, L., Li, J., Wang, L., Zhang, W. (2019) Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition. Proc. Interspeech 2019, 1696-1700, DOI: 10.21437/Interspeech.2019-1683.


@inproceedings{Li2019,
  author={Zhixuan Li and Liang He and Jingyang Li and Li Wang and Wei-Qiang Zhang},
  title={{Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1696--1700},
  doi={10.21437/Interspeech.2019-1683},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1683}
}