ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Integration of DNN based speech enhancement and ASR

Ramón F. Astudillo, Joana Correia, Isabel Trancoso

Speech enhancement employing Deep Neural Networks (DNNs) is gaining strength as a data-driven alternative to classical Minimum Mean Square Error (MMSE) enhancement approaches. In the past, Observation Uncertainty approaches to integrate MMSE speech enhancement with Automatic Speech Recognition (ASR) have yielded good results as a lightweight alternative for robust ASR. In this paper we thus explore the integration of DNN-based speech enhancement with ASR by employing Observation Uncertainty techniques. For this purpose, we explore various techniques and approximations that allow propagating the uncertainty of inference of the DNN into feature domain. This uncertainty can then be used to dynamically compensate the ASR model utilizing techniques like uncertainty decoding. We test the proposed techniques on the AURORA4 corpus and show that notable improvements can be attained over the already effective DNN enhancement.

doi: 10.21437/Interspeech.2015-709

Cite as: Astudillo, R.F., Correia, J., Trancoso, I. (2015) Integration of DNN based speech enhancement and ASR. Proc. Interspeech 2015, 3576-3580, doi: 10.21437/Interspeech.2015-709

  author={Ramón F. Astudillo and Joana Correia and Isabel Trancoso},
  title={{Integration of DNN based speech enhancement and ASR}},
  booktitle={Proc. Interspeech 2015},