Data Augmentation Using GANs for Speech Emotion Recognition

Aggelina Chatziagapi, Georgios Paraskevopoulos, Dimitris Sgouropoulos, Georgios Pantazopoulos, Malvina Nikandrou, Theodoros Giannakopoulos, Athanasios Katsamanis, Alexandros Potamianos, Shrikanth Narayanan

In this work, we address the problem of data imbalance for the task of Speech Emotion Recognition (SER). We investigate conditioned data augmentation using Generative Adversarial Networks (GANs), in order to generate samples for underrepresented emotions. We adapt and improve a conditional GAN architecture to generate synthetic spectrograms for the minority class. For comparison purposes, we implement a series of signal-based data augmentation methods. The proposed GAN-based approach is evaluated on two datasets, namely IEMOCAP and FEEL-25k, a large multi-domain dataset. Results demonstrate a 10% relative performance improvement in IEMOCAP and 5% in FEEL-25k, when augmenting the minority classes.

 DOI: 10.21437/Interspeech.2019-2561

Cite as: Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., Katsamanis, A., Potamianos, A., Narayanan, S. (2019) Data Augmentation Using GANs for Speech Emotion Recognition. Proc. Interspeech 2019, 171-175, DOI: 10.21437/Interspeech.2019-2561.

  author={Aggelina Chatziagapi and Georgios Paraskevopoulos and Dimitris Sgouropoulos and Georgios Pantazopoulos and Malvina Nikandrou and Theodoros Giannakopoulos and Athanasios Katsamanis and Alexandros Potamianos and Shrikanth Narayanan},
  title={{Data Augmentation Using GANs for Speech Emotion Recognition}},
  booktitle={Proc. Interspeech 2019},