ISI ASR System for the Low Resource Speech Recognition Challenge for Indian Languages

Jayadev Billa


This paper describes the ISI ASR system used to generate ISI's submissions across Gujarati, Tamil and Telugu speech recognition tasks as part of the Low Resource Speech Recognition Challenge for Indian Languages. The key constraints on this task were limited training data and the restriction that no external data be used. The ISI ASR system leverages our earlier work on data augmentation and dropout approaches and current work on multilingual training within a Eesen based end-to-end Long Short Term Memory (LSTM) based automatic speech recognition (ASR) system trained with the Connectionist Temporal Classification (CTC) loss criterion and demonstrates, to the best of our knowledge, one of the first times such systems have been applied to low resource languages with performance comparable and some cases better than hybrid DNN systems. Our best monolingual systems show between 6.5% to 25.5% relative reduction in word error rate (WER) compared to the challenge organizer's Time Delay Neural Network (TDNN) based baseline WERs. We further extend these systems with multilingual training approaches that lead to an additional 4.5% to 11.1% relative reduction in WER as measured on the development set.


 DOI: 10.21437/Interspeech.2018-2473

Cite as: Billa, J. (2018) ISI ASR System for the Low Resource Speech Recognition Challenge for Indian Languages. Proc. Interspeech 2018, 3207-3211, DOI: 10.21437/Interspeech.2018-2473.


@inproceedings{Billa2018,
  author={Jayadev Billa},
  title={ISI ASR System for the Low Resource Speech Recognition Challenge for Indian Languages},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3207--3211},
  doi={10.21437/Interspeech.2018-2473},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2473}
}