Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling

Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani

Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limiting the ASR performance improvement. Uncertainty decoding has been proposed to better interconnect the speech enhancement front-end and ASR back-end and mitigate the mismatch caused by residual noise and artifacts. By considering features as distributions instead of point estimates, the uncertainty decoding approach modifies the conventional decoding rules to account for the uncertainty emanating from the speech enhancement. Although the concept of uncertainty decoding has been investigated for DNN acoustic models recently, finding efficient ways to incorporate distribution of the enhanced features within a DNN acoustic model still requires further investigations. In this paper, we propose to parameterize the distribution of the enhanced feature and estimate the parameters by backpropagation using an unsupervised adaptation scheme. We demonstrate the effectiveness of the proposed approach on real audio data of the CHiME3 dataset.

 DOI: 10.21437/Interspeech.2017-793

Cite as: Tran, D.T., Delcroix, M., Ogawa, A., Nakatani, T. (2017) Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling. Proc. Interspeech 2017, 3852-3856, DOI: 10.21437/Interspeech.2017-793.

  author={Dung T. Tran and Marc Delcroix and Atsunori Ogawa and Tomohiro Nakatani},
  title={Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling},
  booktitle={Proc. Interspeech 2017},