Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning

Abhinav Jain, Minali Upreti, Preethi Jyothi


One of the major remaining challenges in modern automatic speech recognition (ASR) systems for English is to be able to handle speech from users with a diverse set of accents. ASR systems that are trained on speech from multiple English accents still underperform when confronted with a new speech accent. In this work, we explore how to use accent embeddings and multi-task learning to improve speech recognition for accented speech. We propose a multi-task architecture that jointly learns an accent classifier and a multi-accent acoustic model. We also consider augmenting the speech input with accent information in the form of embeddings extracted by a separate network. These techniques together give significant relative performance improvements of 15% and 10% over a multi-accent baseline system on test sets containing seen and unseen accents, respectively.


 DOI: 10.21437/Interspeech.2018-1864

Cite as: Jain, A., Upreti, M., Jyothi, P. (2018) Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning. Proc. Interspeech 2018, 2454-2458, DOI: 10.21437/Interspeech.2018-1864.


@inproceedings{Jain2018,
  author={Abhinav Jain and Minali Upreti and Preethi Jyothi},
  title={Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2454--2458},
  doi={10.21437/Interspeech.2018-1864},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1864}
}