11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Mask Estimation in Non-Stationary Noise Environments for Missing Feature Based Robust Speech Recognition

Shirin Badiezadegan, Richard C. Rose

McGill University, Canada

This paper demonstrates the importance of accurate characterization of instantaneous acoustic noise for mask estimation in data imputation approaches to missing feature based ASR, especially in the presence of non-stationary background noise. Mask estimation relies on a hypothesis test designed to detect the presence of speech in time-frequency spectral bins under rapidly varying noise conditions. Masked mel-frequency filter bank energies are reconstructed using a MMSE based data imputation procedure. The impact of this mask estimation approach is evaluated in the context of MMSE based data imputation under multiple background conditions over a range of SNRs using the Aurora 2 speech corpus.

Full Paper

Bibliographic reference.  Badiezadegan, Shirin / Rose, Richard C. (2010): "Mask estimation in non-stationary noise environments for missing feature based robust speech recognition", In INTERSPEECH-2010, 2062-2065.