16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Deep Neural Network Based Spectral Feature Mapping for Robust Speech Recognition

Kun Han, Yanzhang He, Deblin Bagchi, Eric Fosler-Lussier, DeLiang Wang

Ohio State University, USA

Automatic speech recognition (ASR) systems suffer from performance degradation under noisy and reverberant conditions. In this work, we explore a deep neural network (DNN) based approach for spectral feature mapping from corrupted speech to clean speech. The DNN based mapping substantially reduces interference and produces estimated clean spectral features for ASR training and decoding. We experiment with several different feature mapping approaches and demonstrate that a DNN trained to predict clean log filterbank coefficients from noisy spectrogram directly can be extremely effective. The experiments show that the ASR systems with these cleaned features perform well under joint noisy and reverberant conditions, and achieve the state-of-the-art results on the CHiME-2 corpus with stereo (corrupted and clean) data.

Full Paper

Bibliographic reference.  Han, Kun / He, Yanzhang / Bagchi, Deblin / Fosler-Lussier, Eric / Wang, DeLiang (2015): "Deep neural network based spectral feature mapping for robust speech recognition", In INTERSPEECH-2015, 2484-2488.