Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition

Anderson R. Avila, Md Jahangir Alam, Douglas O'Shaughnessy, Tiago Falk


In this study, the performance of two enhancement algorithms is investigated in terms of perceptual quality as well as in respect to their impact on speech emotion recognition (SER). The SER system adopted is based on the same benchmark system provided for the AVEC Challenge 2016. The three objective measures adopted are the speech-to-reverberation modulation energy ratio (SRMR), the perceptual evaluation of speech quality (PESQ) and the perceptual objective listening quality assessment (POLQA). Evaluations are conducted on speech files from the RECOLA dataset, which provides spontaneous interactions in French of 27 subjects. Clean speech files are corrupted with different levels of background noise and reverberation. Results show that applying enhancement prior to the SER task can improve SER performance in more degraded scenarios. We also show that quality measures can be an important asset as indicator of enhancement algorithms performance towards SER, with SRMR and POLQA providing the most reliable results.


 DOI: 10.21437/Interspeech.2018-2350

Cite as: Avila, A.R., Alam, M.J., O'Shaughnessy, D., Falk, T. (2018) Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition. Proc. Interspeech 2018, 3663-3667, DOI: 10.21437/Interspeech.2018-2350.


@inproceedings{Avila2018,
  author={Anderson R. Avila and Md Jahangir Alam and Douglas O'Shaughnessy and Tiago Falk},
  title={Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3663--3667},
  doi={10.21437/Interspeech.2018-2350},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2350}
}