Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition

Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson


It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g., Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.


 DOI: 10.21437/Interspeech.2017-788

Cite as: Do, V.H., Chen, N.F., Lim, B.P., Hasegawa-Johnson, M. (2017) Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition. Proc. Interspeech 2017, 734-738, DOI: 10.21437/Interspeech.2017-788.


@inproceedings{Do2017,
  author={Van Hai Do and Nancy F. Chen and Boon Pang Lim and Mark Hasegawa-Johnson},
  title={Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={734--738},
  doi={10.21437/Interspeech.2017-788},
  url={http://dx.doi.org/10.21437/Interspeech.2017-788}
}