Leveraging Translations for Speech Transcription in Low-resource Settings

Antonios Anastasopoulos, David Chiang


Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a high-resource language that will render the collected resource interpretable. We focus on this scenario and explore whether we can improve transcription quality under these extremely low-resource settings with the assistance of text translations. We present a neural multi-source model and evaluate several variations of it on three low-resource datasets. We find that our multi-source model with shared attention outperforms the baselines, reducing transcription character error rate by up to 12.3%.


 DOI: 10.21437/Interspeech.2018-2162

Cite as: Anastasopoulos, A., Chiang, D. (2018) Leveraging Translations for Speech Transcription in Low-resource Settings. Proc. Interspeech 2018, 1279-1283, DOI: 10.21437/Interspeech.2018-2162.


@inproceedings{Anastasopoulos2018,
  author={Antonios Anastasopoulos and David Chiang},
  title={Leveraging Translations for Speech Transcription in Low-resource Settings},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1279--1283},
  doi={10.21437/Interspeech.2018-2162},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2162}
}