13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Recurrent Neural Networks for Noise Reduction in Robust ASR

Andrew L. Maas (1), Quoc V. Le (1), Tyler M. O'Neil (1), Oriol Vinyals (2), Patrick Nguyen (3), Andrew Y. Ng (1)

(1) Computer Science Department, Stanford University, CA, USA
(2) EECS Department, University of California - Berkeley, Berkeley, CA, USA
(3) Google, Inc., Mountain View, CA, USA

Recent work on deep neural networks as acoustic models for automatic speech recognition (ASR) have demonstrated substantial performance improvements. We introduce a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR. The model is trained on stereo (noisy and clean) audio features to predict clean features given noisy input. The model makes no assumptions about how noise affects the signal, nor the existence of distinct noise environments. Instead, the model can learn to model any type of distortion or additive noise given sufficient training data. We demonstrate the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

Index Terms: neural networks, robust ASR, deep learning

Full Paper

Bibliographic reference.  Maas, Andrew L. / Le, Quoc V. / O'Neil, Tyler M. / Vinyals, Oriol / Nguyen, Patrick / Ng, Andrew Y. (2012): "Recurrent neural networks for noise reduction in robust ASR", In INTERSPEECH-2012, 22-25.