9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Speech Recognition in Noisy Environments Using a Switching Linear Dynamic Model for Feature Enhancement

Björn Schuller (1), Martin Wöllmer (1), Tobias Moosmayr (2), Gerhard Rigoll (1)

(1) Technische Universität München, Germany; (2) BMW Group, Germany

The performance of automatic speech recognition systems strongly decreases whenever the speech signal is disturbed by background noise. We aim to improve noise robustness focusing on all major levels of speech recognition: feature extraction, feature enhancement, and speech modeling. Different auditory modeling concepts, speech enhancement techniques, training strategies, and model architectures are implemented in an in-car digit and spelling recognition task. We prove that joint speech and noise modeling with a global Switching Linear Dynamic Model (SLDM) capturing the dynamics of speech, and a Linear Dynamic Model (LDM) for noise, prevails over state-of-the-art speech enhancement techniques. Furthermore we show that the baseline recognizer of the Interspeech Consonant Challenge 2008 can be outperformed by SLDM feature enhancement for almost all of the noisy testsets.

Full Paper

Bibliographic reference.  Schuller, Björn / Wöllmer, Martin / Moosmayr, Tobias / Rigoll, Gerhard (2008): "Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement", In INTERSPEECH-2008, 1789-1792.