ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Using Large Self-Supervised Models for Low-Resource Speech Recognition

Krishna D. N, Pinyi Wang, Bruno Bozza

Recently, self-supervised pre-training has shown significant improvements in many areas of machine learning, including speech and NLP. The self-supervised models are trained on a large amount of unlabelled data to learn higher-level representations for downstream tasks. In this work, we investigate the effectiveness of many self-supervised pre-trained models for the low-resource speech recognition task. We adopt pre-trained wav2vec2.0 [1] models for the speech recognition task for three Indian languages Telugu, Tamil, and Gujarati. We examine both English and multilingual pre-trained models. Our experiments show that fine-tuning the multilingual pre-trained model obtains an average relative reduction in WER of 2.88% compared to the previous state-of-the-art supervised method. We carefully analyze the generalization capability of multilingual pre-trained models for both seen and unseen languages. We also show that fine-tuning with only 25% of the training data gives competitive WER to the previous best methods.

doi: 10.21437/Interspeech.2021-631

Cite as: N, K.D., Wang, P., Bozza, B. (2021) Using Large Self-Supervised Models for Low-Resource Speech Recognition. Proc. Interspeech 2021, 2436-2440, doi: 10.21437/Interspeech.2021-631

  author={Krishna D. N and Pinyi Wang and Bruno Bozza},
  title={{Using Large Self-Supervised Models for Low-Resource Speech Recognition}},
  booktitle={Proc. Interspeech 2021},