EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology
2nd INTERSPEECH Event

Aalborg, Denmark
September 3-7, 2001

                 

ALGONQUIN: Iterating Laplace's Method to Remove Multiple Types of Acoustic Distortion for Robust Speech Recognition

Brendan J. Frey (1), Li Deng (2), Alex Acero (2), Trausti Kristjansson (3)

(1) University of Toronto, Canada; (2) Microsoft Research, USA; (3) University of Waterloo, Canada

We show how an iterative form of Laplace's method can be used to estimate the log-spectrum of clean speech from the log-spectrum of noisy, distorted speech, using a time-varying mixture model of the logspectra of the clean speech, noise, channel distortion and noisy speech. We use this method, called ALGONQUIN, to denoise speech features and then feed these features into a large vocabulary speech recognizer whose WER on the clean WSJ data is 4.9%. When 10dB of time-varying airplane engine noise is added to the data, the recognizer obtains a WER of 28.8%. ALGONQUIN reduces the WER to 12.6%, well below the WER of 25.0% obtained by spectral subtraction, and close to the WER of 9.7% obtained by retraining the recognizer on training data corrupted by the exact same noise. If ALGONQUIN is used to denoise the noisy training data before the recognizer is retrained, the WER drops to 8.5%. For 10dB of white noise, spectral subtraction reduces the WER from 55.1% to 33.8%. ALGONQUIN reduces the WER to 14.2%. The recognizer trained on noisy data obtains a WER of 14.0%, whereas the recognizer trained on noisy data denoised by ALGONQUIN obtains a WER of 9.9%.

Full Paper

Bibliographic reference.  Frey, Brendan J. / Deng, Li / Acero, Alex / Kristjansson, Trausti (2001): "ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition", In EUROSPEECH-2001, 901-904.