ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments

Gang Liu, Dimitrios Dimitriadis, Enrico Bocchieri

In the current ASR systems the presence of competing speakers greatly degrades the recognition performance. This phenomenon is getting even more prominent in the case of hands-free, far-field ASR systems like the "Smart-TV" systems, where reverberation and non-stationary noise pose additional challenges. Furthermore, speakers are, most often, not standing still while speaking. To address these issues, we propose a cascaded system that includes Time Differences of Arrival estimation, multi-channel Wiener Filtering, non-negative matrix factorization (NMF), multi-condition training, and robust feature extraction, whereas each of them additively improves the overall performance. The final cascaded system presents an average of 50% and 45% relative improvement in ASR word accuracy for the CHiME 2011 (non-stationary noise) and CHiME 2012 (non-stationary noise plus speaker head movement) tasks, respectively.


doi: 10.21437/Interspeech.2013-281

Cite as: Liu, G., Dimitriadis, D., Bocchieri, E. (2013) Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments. Proc. Interspeech 2013, 3017-3021, doi: 10.21437/Interspeech.2013-281

@inproceedings{liu13c_interspeech,
  author={Gang Liu and Dimitrios Dimitriadis and Enrico Bocchieri},
  title={{Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3017--3021},
  doi={10.21437/Interspeech.2013-281}
}