The I2R’s ASR System for the VOiCES from a Distance Challenge 2019

Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Huy Dat Tran


This paper describes the development of the automatic speech recognition (ASR) system for the submission to the VOiCES from a Distance Challenge 2019. In this challenge, we focused on the fixed condition, where the task is to recognize reverberant and noisy speech based on a limited amount of clean training data. In our system, the mismatch between the training and testing conditions was reduced by using multi-style training where the training data was artificially contaminated with different reverberation and noise sources. Also, the Weighted Prediction Error (WPE) algorithm was used to reduce the reverberant effect in the evaluation data. To boost the system performance, acoustic models of different neural network architectures were trained and the respective systems were fused to give the final output. Moreover, an LSTM language model was used to rescore the lattice to compensate the weak n-gram model trained from only the transcription text. Evaluated on the development set, our system showed an average word error rate (WER) of 27.04%.


 DOI: 10.21437/Interspeech.2019-2130

Cite as: Chong, T.Y., Tan, K.M., Teh, K.K., You, C.H., Sun, H., Tran, H.D. (2019) The I2R’s ASR System for the VOiCES from a Distance Challenge 2019. Proc. Interspeech 2019, 2458-2462, DOI: 10.21437/Interspeech.2019-2130.


@inproceedings{Chong2019,
  author={Tze Yuang Chong and Kye Min Tan and Kah Kuan Teh and Chang Huai You and Hanwu Sun and Huy Dat Tran},
  title={{The I2R’s ASR System for the VOiCES from a Distance Challenge 2019}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2458--2462},
  doi={10.21437/Interspeech.2019-2130},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2130}
}