We propose three algorithms to address the mismatch problem in deep neural network (DNN) based speech enhancement. First, we investigate noise aware training by incorporating noise information in the test utterance with an ideal binary mask based dynamic noise estimation approach to improve DNN's speech separation ability from the noisy signal. Next, a set of more than 100 noise types is adopted to enrich the generalization capabilities of the DNN to unseen and non-stationary noise conditions. Finally, the quality of the enhanced speech can further be improved by global variance equalization. Empirical results show that each of the three proposed techniques contributes to the performance improvement. Compared to the conventional logarithmic minimum mean squared error speech enhancement method, our DNN system achieves 0.32 PESQ (perceptual evaluation of speech quality) improvement across six signal-to-noise ratio levels ranging from -5dB to 20dB on a test set with unknown noise types. We also observe that the combined strategies can well suppress highly non-stationary noise better than all the competing state-of-the-art techniques we have evaluated.
Bibliographic reference. Xu, Yong / Du, Jun / Dai, Li-Rong / Lee, Chin-Hui (2014): "Dynamic noise aware training for speech enhancement based on deep neural networks", In INTERSPEECH-2014, 2670-2674.