Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge

Feng Ma, Li Chai, Jun Du, Diyuan Liu, Zhongfu Ye, Chin-Hui Lee


CHiME-5 is a research community challenge targeting the problem of far-field and multi-talker conversational speech recognition in dinner party scenarios involving background noises, reverberations and overlapping speech. In this study, we present five different kinds of robust acoustic models which take advantages from both effective data augmentation and ensemble methods to improve the recognition performance for the CHiME-5 challenge. First, we detail the effective data augmentation for far-field scenarios, especially the far-field data simulation. Different from the conventional data simulation methods, we use a signal processing method originally developed for channel identification to estimate the room impulse responses and then simulate the far-field data. Second, we introduce the five different kinds of robust acoustic models. Finally, the effectiveness of our acoustic model ensembling strategies at the lattice level and the state posterior level are evaluated and demonstrated. Our system achieves the best performance of all four tasks among submitted systems in the CHiME-5 challenge.


 DOI: 10.21437/Interspeech.2019-2601

Cite as: Ma, F., Chai, L., Du, J., Liu, D., Ye, Z., Lee, C. (2019) Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge. Proc. Interspeech 2019, 1258-1262, DOI: 10.21437/Interspeech.2019-2601.


@inproceedings{Ma2019,
  author={Feng Ma and Li Chai and Jun Du and Diyuan Liu and Zhongfu Ye and Chin-Hui Lee},
  title={{Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1258--1262},
  doi={10.21437/Interspeech.2019-2601},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2601}
}