Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Classifier-Based Mask Estimation for Missing Feature Methods of Robust Speech Recognition

Michael L. Seltzer, Bhiksha Raj, Richard M. Stern

Department of Electrical and Computer Engineering and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

Missing feature methods of noise compensation for speech recognition operate by removing components of a spectrographic representation of speech that are considered to be corrupt, as indicated by a low signal-to-noise ratio. Recognition is either performed directly on the incomplete spectrograms or the missing components are reconstructed prior to recognition. These methods require a spectrographic mask which accurately labels the reliable and corrupt regions of the spectrogram. Current methods of mask estimation rely on assumptions about the corrupting noise such as stationarity. This is a significant drawback since the missing feature methods themselves have no such restrictions. We present a new mask estimation technique that uses a Bayesian classifier to determine the reliability of spectrographic elements. Features were designed that make no assumptions about the corrupting noise signal, but rather exploit characteristics of the speech signal itself. Missing feature compensation experiments were performed on speech corrupted by a variety of noises. In all cases, classifier-based mask estimation resulted in significantly better recognition accuracy than conventional mask estimation methods.


Full Paper

Bibliographic reference.  Seltzer, Michael L. / Raj, Bhiksha / Stern, Richard M. (2000): "Classifier-based mask estimation for missing feature methods of robust speech recognition", In ICSLP-2000, vol.3, 538-541.