Leveraging Native Language Information for Improved Accented Speech Recognition

Shahram Ghorbani, John H.L. Hansen


Recognition of accented speech is a long-standing challenge for ASR systems, given the increasing worldwide population of bi-lingual speakers with English as their second language. If we consider foreign-accented speech as an interpolation of the native language(L1) and English(L2), using a model that can simultaneously recognize both languages would perform better at the acoustic level for accented speech. In this study, we explore how an end-to-end recurrent neural network (RNN) trained system with English and native languages (Spanish and Indian languages) could leverage the data of native languages to perform better for accented English speech. To this end, we examine using pre-training with native languages, as well as multitask learning in which the main task is trained with native English data and the secondary task is trained with Spanish or Indian Languages. We show that the multitask setting performs better than the former approach. We suggest a new setting for multitask learning in which the secondary task is trained with both English and the native language, using the same output set. This proposed scenario yields better performance than the first setting which provides +11.95% and +17.55% character error rate (CER) gain over the baseline, for Hispanic and Indian accents, respectively.


 DOI: 10.21437/Interspeech.2018-1378

Cite as: Ghorbani, S., Hansen, J.H. (2018) Leveraging Native Language Information for Improved Accented Speech Recognition. Proc. Interspeech 2018, 2449-2453, DOI: 10.21437/Interspeech.2018-1378.


@inproceedings{Ghorbani2018,
  author={Shahram Ghorbani and John H.L. Hansen},
  title={Leveraging Native Language Information for Improved Accented Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2449--2453},
  doi={10.21437/Interspeech.2018-1378},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1378}
}