Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition

Rupayan Chakraborty, Ashish Panda, Meghna Pandharipande, Sonal Joshi, Sunil Kumar Kopparapu


Front-end processing is one of the ways to impart noise robustness to speech emotion recognition systems in mismatched scenarios. Here, we implement and compare different frontend robustness techniques for their efficacy in speech emotion recognition. First, we use a feature compensation technique based on the Vector Taylor Series (VTS) expansion of noisy Mel-Frequency Cepstral Coefficients (MFCCs). Next, we improve upon the feature compensation technique by using the VTS expansion with auditory masking formulation. We have also looked into the applicability of 10th-root compression in MFCC computation. Further, a Time Delay Neural Network based Denoising Autoencoder (TDNN-DAE) is implemented to estimate the clean MFCCs from the noisy MFCCs. These techniques have not been investigated yet for their suitability to robust speech emotion recognition task. The performance of these front-end techniques are compared with the Non-Negative Matrix Factorization (NMF) based front-end. Relying on extensive experiments done on two standard databases (EmoDB and IEMOCAP), contaminated with 5 types of noise, we show that these techniques provide significant performance gain in emotion recognition task. We also show that along with front-end compensation, applying feature selection to non-MFCC high-level descriptors results in better performance.


 DOI: 10.21437/Interspeech.2019-2243

Cite as: Chakraborty, R., Panda, A., Pandharipande, M., Joshi, S., Kopparapu, S.K. (2019) Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition. Proc. Interspeech 2019, 3257-3261, DOI: 10.21437/Interspeech.2019-2243.


@inproceedings{Chakraborty2019,
  author={Rupayan Chakraborty and Ashish Panda and Meghna Pandharipande and Sonal Joshi and Sunil Kumar Kopparapu},
  title={{Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3257--3261},
  doi={10.21437/Interspeech.2019-2243},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2243}
}