16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Comparison of Gaussian Process Regression and Gaussian Mixture Models in Spectral Tilt Modelling for Intelligibility Enhancement of Telephone Speech

Emma Jokinen, Ulpu Remes, Paavo Alku

Aalto University, Finland

Intelligibility enhancement can be applied in mobile communications as a post-processing step when the background noise conditions are adverse. In this study, post-processing methods aiming to model the Lombard effect are investigated. More specifically, the study focuses on mapping the spectral tilt of normal speech to that of Lombard speech to improve intelligibility of telephone speech in near-end noise conditions. Two different modelling techniques, Gaussian mixture models (GMMs) and Gaussian processes (GPs), are evaluated with different amounts of training data. Normal-to-Lombard conversions implemented by GMMs and GPs are then compared objectively as well as in subjective intelligibility and quality tests with unprocessed speech in different noise conditions. All GMMs and GPs evaluated in the subjective tests were able to improve intelligibility without significant decrease in quality compared to unprocessed speech. While the best intelligibility results were obtained with a GP model, other GMM and GP alternatives were rated higher in quality. Based on the results, determining the best modelling technique for normal-to-Lombard mapping is challenging and calls for further studies.

Full Paper

Bibliographic reference.  Jokinen, Emma / Remes, Ulpu / Alku, Paavo (2015): "Comparison of Gaussian process regression and Gaussian mixture models in spectral tilt modelling for intelligibility enhancement of telephone speech", In INTERSPEECH-2015, 85-89.