Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling

Peidong Wang, Ke Tan, DeLiang Wang


Monaural speech enhancement has made dramatic advances in recent years. Although enhanced speech has been demonstrated to have better intelligibility and quality for human listeners, feeding it directly to automatic speech recognition (ASR) systems trained with noisy speech has not produced expected improvements in ASR performance. The lack of an enhancement benefit on recognition, or the gap between monaural speech enhancement and recognition, is often attributed to speech distortions introduced in the enhancement process. In this study, we analyze the distortion problem and propose a distortion-independent acoustic modeling scheme. Experimental results show that the distortion-independent acoustic model is able to overcome the distortion problem. Moreover, it can be used with various speech enhancement models. Both the distortion-independent and a noise-dependent acoustic model perform better than the previous best system on the CHiME-2 corpus. The noise-dependent acoustic model achieves a word error rate of 8.7%, outperforming the previous best result by 6.5% relatively.


 DOI: 10.21437/Interspeech.2019-1495

Cite as: Wang, P., Tan, K., Wang, D. (2019) Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling. Proc. Interspeech 2019, 471-475, DOI: 10.21437/Interspeech.2019-1495.


@inproceedings{Wang2019,
  author={Peidong Wang and Ke Tan and DeLiang Wang},
  title={{Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={471--475},
  doi={10.21437/Interspeech.2019-1495},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1495}
}