The performance of automatic speech recognition systems strongly decreases whenever the speech signal is disturbed by background noise. We aim to improve noise robustness focusing on all major levels of speech recognition: feature extraction, feature enhancement, and speech modeling. Different auditory modeling concepts, speech enhancement techniques, training strategies, and model architectures are implemented in an in-car digit and spelling recognition task. We prove that joint speech and noise modeling with a global Switching Linear Dynamic Model (SLDM) capturing the dynamics of speech, and a Linear Dynamic Model (LDM) for noise, prevails over state-of-the-art speech enhancement techniques. Furthermore we show that the baseline recognizer of the Interspeech Consonant Challenge 2008 can be outperformed by SLDM feature enhancement for almost all of the noisy testsets.
Bibliographic reference. Schuller, Björn / Wöllmer, Martin / Moosmayr, Tobias / Rigoll, Gerhard (2008): "Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement", In INTERSPEECH-2008, 1789-1792.