TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages

Noor Fathima, Tanvina Patel, Mahima C, Anuroop Iyengar


India is a diverse and multilingual country. It has vast linguistic variations, spoken across its billion plus population. Lack of resources in terms of transcribed speech data, phonetic pronunciation dictionary or lexicon and text collection has hindered the development and improvement of the ASR systems for Indic languages. With the Interspeech 2018 Special Session: Low Resource Speech Recognition Challenge for Indian Languages, efforts have been made to solve this issue to an extent. In this paper, we explore the fact that the shared phonetic properties of the languages are essential for improved ASR performance. We build a multilingual Time Delay Neural Network (TDNN) system that uses combined acoustic modeling and language-specific information to decode the input test sequences. Using this approach, for Tamil, Telugu and Gujarati language we obtain a Word Error Rate (WER) of 16.07%, 17.14%, 17.69%, respectively, which was the second best system at the challenge.


 DOI: 10.21437/Interspeech.2018-2117

Cite as: Fathima, N., Patel, T., C, M., Iyengar, A. (2018) TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages. Proc. Interspeech 2018, 3197-3201, DOI: 10.21437/Interspeech.2018-2117.


@inproceedings{Fathima2018,
  author={Noor Fathima and Tanvina Patel and Mahima C and Anuroop Iyengar},
  title={TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3197--3201},
  doi={10.21437/Interspeech.2018-2117},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2117}
}