16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Robust Speech Recognition Using DNN-HMM Acoustic Model Combining Noise-Aware Training with Spectral Subtraction

Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa

Toyohashi University of Technology, Japan

Recently, acoustic models based on deep neural networks (DNNs) have been introduced and showed dramatic improvements over acoustic models based on GMM in a variety of tasks. In this paper, we considered the improvement of noise robustness of DNN. Inspired by Missing Feature Theory and static noise aware training, we proposed an approach that uses a noise-suppressed acoustic feature and estimated noise information as input of DNN. We used simple Spectral Subtraction as noise-suppression. As noise estimation, we used estimation per utterance or frame. In noisy speech recognition experiments, we compared the proposed method with other methods and the proposed method showed the superior performance than the other approaches. For noise estimation per utterance with log Mel Filterbank, we obtained 28.6% word error rate reduction compared with multi condition training, 5.9% reduction compared with noise adaptive training.

Full Paper

Bibliographic reference.  Abe, Akihiro / Yamamoto, Kazumasa / Nakagawa, Seiichi (2015): "Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction", In INTERSPEECH-2015, 2849-2853.