Untranscribed Web Audio for Low Resource Speech Recognition

Andrea Carmantini, Peter Bell, Steve Renals


Speech recognition models are highly susceptible to mismatch in the acoustic and language domains between the training and the evaluation data. For low resource languages, it is difficult to obtain transcribed speech for target domains, while untranscribed data can be collected with minimal effort. Recently, a method applying lattice-free maximum mutual information (LF-MMI) to untranscribed data has been found to be effective for semi-supervised training. However, weaker initial models and domain mismatch can result in high deletion rates for the semi-supervised model. Therefore, we propose a method to force the base model to overgenerate possible transcriptions, relying on the ability of LF-MMI to deal with uncertainty. On data from the IARPA MATERIAL programme, our new semi-supervised method outperforms the standard semi-supervised method, yielding significant gains when adapting for mismatched bandwidth and domain.


 DOI: 10.21437/Interspeech.2019-2623

Cite as: Carmantini, A., Bell, P., Renals, S. (2019) Untranscribed Web Audio for Low Resource Speech Recognition. Proc. Interspeech 2019, 226-230, DOI: 10.21437/Interspeech.2019-2623.


@inproceedings{Carmantini2019,
  author={Andrea Carmantini and Peter Bell and Steve Renals},
  title={{Untranscribed Web Audio for Low Resource Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={226--230},
  doi={10.21437/Interspeech.2019-2623},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2623}
}