16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Integration of DNN Based Speech Enhancement and ASR

Ramón F. Astudillo, Joana Correia, Isabel Trancoso

INESC-ID Lisboa, Portugal

Speech enhancement employing Deep Neural Networks (DNNs) is gaining strength as a data-driven alternative to classical Minimum Mean Square Error (MMSE) enhancement approaches. In the past, Observation Uncertainty approaches to integrate MMSE speech enhancement with Automatic Speech Recognition (ASR) have yielded good results as a lightweight alternative for robust ASR. In this paper we thus explore the integration of DNN-based speech enhancement with ASR by employing Observation Uncertainty techniques. For this purpose, we explore various techniques and approximations that allow propagating the uncertainty of inference of the DNN into feature domain. This uncertainty can then be used to dynamically compensate the ASR model utilizing techniques like uncertainty decoding. We test the proposed techniques on the AURORA4 corpus and show that notable improvements can be attained over the already effective DNN enhancement.

Full Paper

Bibliographic reference.  Astudillo, Ramón F. / Correia, Joana / Trancoso, Isabel (2015): "Integration of DNN based speech enhancement and ASR", In INTERSPEECH-2015, 3576-3580.