The application of Missing Data Theory (MDT) has shown to improve the robustness of automatic speech recognition (ASR) systems. A crucial part in a MDT-based recognizer is the computation of the reliability masks from noisy data. To estimate accurate masks in environments with unknown, non-stationary noise statistics, we need to rely on a strong model for the speech. In this paper, an unsupervised technique using non-negative matrix factorization (NMF) discovers phone-sized time-frequency patches into which speech can be decomposed. The input matrix for the NMF is constructed using a high resolution and reassigned time-frequency representation. This representation facilitates an accurate detection of the patches that are active in unseen noisy speech. After further denoising of the patch activations, speech and noise can be reconstructed from which missing feature masks are estimated. Recognition experiments on the Aurora2 database demonstrate the effectiveness of this technique.
Bibliographic reference. Segbroeck, Maarten Van / Van hamme, Hugo (2009): "Applying non-negative matrix factorization on time-frequency reassignment spectra for missing data mask estimation", In INTERSPEECH-2009, 2511-2514.