In this contribution we investigate the effectiveness of Bayesian feature enhancement (BFE) on a medium-sized recognition task containing real-world recordings of noisy reverberant speech. BFE employs a very coarse model of the acoustic impulse response (AIR) from the source to the microphone, which has been shown to be effective if the speech to be recognized has been generated by artificially convolving nonreverberant speech with a constant AIR. Here we demonstrate that the model is also appropriate to be used in feature enhancement of true recordings of noisy reverberant speech. On the Multi-Channel Wall Street Journal Audio Visual corpus (MC-WSJ-AV) the word error rate is cut in half to 41.9% compared to the ETSI Standard Front-End using as input the signal of a single distant microphone with a single recognition pass.
Index Terms: bayesian feature enhancement, dereverberation, denoising
Bibliographic reference. Krueger, Alexander / Walter, Oliver / Leutnant, Volker / Haeb-Umbach, Reinhold (2012): "Bayesian feature enhancement for ASR of noisy reverberant real-world data", In INTERSPEECH-2012, 807-810.