Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds

Michael Levit, Yan Huang, Shuangyu Chang, Yifan Gong


A crowdsourcing approach for collecting high-quality speech transcriptions is presented. The approach addresses typical weakness of traditional semi-supervised transcription strategies that show ASR hypotheses to transcribers to help them cope with unclear or ambiguous audio and speed up transcriptions. We explain how the traditional methods introduce bias into transcriptions that make it difficult to objectively measure system improvements against existing baselines, and suggest a two-stage crowdsourcing alternative that, first, iteratively collects transcription hypotheses and, then, asks a different crowd to pick the best of them. We show that this alternative not only outperforms the traditional method in a side-by-side comparison, but it also leads to ASR improvements due to superior quality of acoustic and language models trained on the transcribed data.


 DOI: 10.21437/Interspeech.2017-164

Cite as: Levit, M., Huang, Y., Chang, S., Gong, Y. (2017) Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds. Proc. Interspeech 2017, 3941-3945, DOI: 10.21437/Interspeech.2017-164.


@inproceedings{Levit2017,
  author={Michael Levit and Yan Huang and Shuangyu Chang and Yifan Gong},
  title={Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3941--3945},
  doi={10.21437/Interspeech.2017-164},
  url={http://dx.doi.org/10.21437/Interspeech.2017-164}
}