COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction
University of East Anglia, Norwich, UK
In this paper, we motivate the introduction of multiple feature streams to cover the gap between the noise-free and the estimated features in the context of Model-Based Feature Enhancement (MBFE) for noise robust speech recognition. Especially at low local SNR-levels the global MMSE-estimate might not be optimal and its uncertainty is large. Therefore, it is first shown how a constrained quadratic optimisation problem can improve the linear combination weights in the MMSE-formula. Alternatively, these weights are then approximated by K Kronecker deltas. Both approaches are compared by recognition experiments on the Aurora2 task. Also, Multiple Stream MBFE is validated on the large vocabulary Aurora4 benchmark task. On the latter, a decrease in average Word Error Rate could be obtained from 37.73% (no enhancement) to 26.13% (single stream MBFE) and finally, to 24.89% (multiple stream MBFE).
Bibliographic reference. Stouten, Veronique / Hamme, Hugo Van / Wambacq, Patrick (2004): "Multiple stream model-based feature enhancement for noise robust speech recognition", In Robust2004, paper 12.