ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Time-frequency masking for large scale robust speech recognition

Yuxuan Wang, Ananya Misra, Kean K. Chin

Time-frequency mask estimation has shown considerable success recently. In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition. Additionally, we investigate how masking compares with feature denoising, which directly reconstructs clean features from noisy ones. We train a mask estimator that predicts ideal ratio masks. Experimental results on Google voice search evaluation sets demonstrate that masking is superior to feature denoising, and a lightweight masking frontend produces significant improvements over a strong baseline. We also show that masking improves performance of a multi-condition trained (MTR) acoustic model.


doi: 10.21437/Interspeech.2015-533

Cite as: Wang, Y., Misra, A., Chin, K.K. (2015) Time-frequency masking for large scale robust speech recognition. Proc. Interspeech 2015, 2469-2473, doi: 10.21437/Interspeech.2015-533

@inproceedings{wang15h_interspeech,
  author={Yuxuan Wang and Ananya Misra and Kean K. Chin},
  title={{Time-frequency masking for large scale robust speech recognition}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={2469--2473},
  doi={10.21437/Interspeech.2015-533}
}