On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks

Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson


Generative Adversarial Networks (GANs) have gained a lot of attention from machine learning community due to their ability to learn and mimic an input data distribution. GANs consist of a discriminator and a generator working in tandem playing a min-max game to learn the complex underlying data distribution when fed with data-points sampled from a simpler distribution like Uniform or Gaussian. Once trained, they allow synthetic generation of examples sampled from the learned distribution. We investigate the application of GANs to generate synthetic feature vectors used for speech emotion recognition. Specifically, we investigate two set ups: (i) a vanilla GAN that learns the distribution of a lower dimensional representation of the actual higher dimensional feature vector and (ii) a conditional GAN that learns the distribution of the higher dimensional feature vectors conditioned on the labels or the emotional class to which it belongs. As a potential practical application of these synthetically generated samples, we measure any improvement in a classifier‘s performance when the synthetic data was used along with real for training it. We perform cross validation analyses followed by a cross-corpus study.


 DOI: 10.21437/Interspeech.2018-1883

Cite as: Sahu, S., Gupta, R., Espy-Wilson, C. (2018) On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks. Proc. Interspeech 2018, 3693-3697, DOI: 10.21437/Interspeech.2018-1883.


@inproceedings{Sahu2018,
  author={Saurabh Sahu and Rahul Gupta and Carol Espy-Wilson},
  title={On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3693--3697},
  doi={10.21437/Interspeech.2018-1883},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1883}
}