The JHU ASR System for VOiCES from a Distance Challenge 2019

Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Sankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur

This paper describes the system developed by the JHU team for automatic speech recognition (ASR) of the VOiCES from a Distance Challenge 2019, focusing on single channel distant/farfield audio under noisy conditions. We participated in the Fixed Condition track, where the systems are only trained on an 80-hour subset of the Librispeech corpus provided by the organizer. The training data was first augmented with both background noises and simulated reverberation. We then trained factorized TDNN acoustic models that differed only in their use of i-vectors for adaptation. Both systems utilized RNN language models trained on original and reversed text for rescoring. We submitted three systems: the system using i-vectors with WER 19.4% on the development set, the system without i-vectors that achieved WER 19.0%, and the their lattice-level fusion with WER 17.8%. On the evaluation set, our best system achieves 23.9% WER.

 DOI: 10.21437/Interspeech.2019-1948

Cite as: Wang, Y., Snyder, D., Xu, H., Manohar, V., Nidadavolu, P.S., Povey, D., Khudanpur, S. (2019) The JHU ASR System for VOiCES from a Distance Challenge 2019. Proc. Interspeech 2019, 2488-2492, DOI: 10.21437/Interspeech.2019-1948.

  author={Yiming Wang and David Snyder and Hainan Xu and Vimal Manohar and Phani Sankar Nidadavolu and Daniel Povey and Sanjeev Khudanpur},
  title={{The JHU ASR System for VOiCES from a Distance Challenge 2019}},
  booktitle={Proc. Interspeech 2019},