Automatic speech recognition (ASR) systems suffer from performance degradation under noisy and reverberant conditions. In this work, we explore a deep neural network (DNN) based approach for spectral feature mapping from corrupted speech to clean speech. The DNN based mapping substantially reduces interference and produces estimated clean spectral features for ASR training and decoding. We experiment with several different feature mapping approaches and demonstrate that a DNN trained to predict clean log filterbank coefficients from noisy spectrogram directly can be extremely effective. The experiments show that the ASR systems with these cleaned features perform well under joint noisy and reverberant conditions, and achieve the state-of-the-art results on the CHiME-2 corpus with stereo (corrupted and clean) data.
Bibliographic reference. Han, Kun / He, Yanzhang / Bagchi, Deblin / Fosler-Lussier, Eric / Wang, DeLiang (2015): "Deep neural network based spectral feature mapping for robust speech recognition", In INTERSPEECH-2015, 2484-2488.