Machine Listening in Multisource Environments (CHiME) 2011
While much progress has been made in designing robust automatic speech recognition (ASR) systems, the combination of high noise levels and reverberant room acoustics still poses a major challenge even to state-of-the-art systems. The following paper describes how robust automatic speech recognition in such difficult environments can be approached by combining beamforming and missing data techniques.
The combination of these two techniques is achieved by first estimating uncertainties of observation in the beamforming stage, either in the time or frequency domain, and subsequently transforming these observations with associated uncertainties to the domain of speech recognition. This strategy allows the use of reverberation-insensitive cepstral features, which can still be decoded robustly with the help of uncertainty information gained from the beamforming front end.
In this paper, we investigate a number of different preprocessing options with the somewhat surprising result that a simple fixed delay-and-sum beamformer and a null-steering beamformer, when combined with uncertainty decoding techniques, resulted in the most robust design among a much wider set of investigated techniques.
Index Terms. robustness, automatic speech recognition, beamforming, uncertainty decoding
Full Paper Slides
Bibliographic reference. Kolossa, Dorothea / Fernandez Astudillo, Ramón / Abad, Alberto / Zeiler, Steffen / Saeidi, Rahim / Mowlaee, Pejman / Neto, João Paulo da Silva / Martin, Rainer (2011): "CHiME challenge: approaches to robustness using beamforming and uncertainty-of-observation techniques", In CHiME-2011, 6-11.