Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Reconstruction of Damaged Spectrographic Features for Robust Speech Recognition

Bhiksha Raj (1), Michael L. Seltzer (2), Richard M. Stern (2)

(1) Compaq Computer Corporation, Cambridge, MA, USA
(2) Department of Electrical and Computer Engineering and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

We present two missing-feature based algorithms that recover noise-corrupted regions of spectrographic representations of speech for noise-robust speech recognition. These algorithms modify the incoming feature vector without any changes to the speech recognition system, in contrast to previously-described approaches. The first approach clusters the feature vectors representing clean speech. Missing data are recovered by estimating the spectral cluster in each analysis frame based on the uncorrupted feature values. The second approach uses MAP procedures to estimate the values of missing data elements based on their correlations with the features that are present. Both methods take into account bounds on the clean spectrogram implied by the noisy spectrogram. Large improvements in recognition accuracy are observed when these methods are used on speech corrupted by non-stationary noise when the locations of the corrupt regions of the spectrogram are known. We also present a new method of estimating the locations of corrupt regions in spectrograms that treats the problem of identifying these regions as one of Bayesian classification. This method, when used along with the best method to reconstruct them, results in recognition accuracies comparable with the best previous data compensation algorithm on speech corrupted by white noise. It also provides significant improvement on speech corrupted by music when the global SNR of the corrupted signal is known a priori.

Full Paper

Bibliographic reference.  Raj, Bhiksha / Seltzer, Michael L. / Stern, Richard M. (2000): "Reconstruction of damaged spectrographic features for robust speech recognition", In ICSLP-2000, vol.1, 357-360.