8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Robust Speech Recognition Using Model-Based Feature Enhancement

Veronique Stouten, Hugo van Hamme, Kris Demuynck, Patrick Wambacq

Katholieke Universiteit Leuven, Belgium

Maintaining a high level of robustness for Automatic Speech Recognition (ASR) systems is especially challenging when the background noise has a time-varying nature. We have implemented a Model-Based Feature Enhancement (MBFE) technique that not only can easily be embedded in the feature extraction module of a recogniser, but also is intrinsically suited for the removal of non-stationary additive noise. To this end we combine statistical models of the cepstral feature vectors of both clean speech and noise, using a Vector Taylor Series approximation in the power spectral domain. Based on this combined HMM, a global MMSE-estimate of the clean speech is then calculated. Because of the scalability of the applied models, MBFE is flexible and computationally feasible. Recognition experiments with this feature enhancement technique on the Aurora2 connected digit recognition task showed significant improvements on the noise robustness of the HTK recogniser.

Full Paper

Bibliographic reference.  Stouten, Veronique / Hamme, Hugo van / Demuynck, Kris / Wambacq, Patrick (2003): "Robust speech recognition using model-based feature enhancement", In EUROSPEECH-2003, 17-20.