ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A self-labeling speech corpus: collecting spoken words with an online educational game

Ian McGraw, Alexander Gruenstein, Andrew Sutherland

We explore a new approach to collecting and transcribing speech data by using online educational games. One such game, Voice Race, elicited over 55,000 utterances over a 22 day period, representing 18.7 hours of speech. Voice Race was designed such that the transcripts for a significant subset of utterances can be automatically inferred using the contextual constraints of the game. Game context can also be used to simplify transcription to a multiple choice task, which can be performed by non-experts. We found that one third of the speech collected with Voice Race could be automatically transcribed with over 98% accuracy; and that an additional 49% could be labeled cheaply by Amazon Mechanical Turk workers. We demonstrate the utility of the self-labeled speech in an acoustic model adaptation task, which resulted in a reduction in the Voice Race utterance error rate. The collected utterances cover a wide variety of vocabulary, and should be useful across a range of research.


doi: 10.21437/Interspeech.2009-561

Cite as: McGraw, I., Gruenstein, A., Sutherland, A. (2009) A self-labeling speech corpus: collecting spoken words with an online educational game. Proc. Interspeech 2009, 3031-3034, doi: 10.21437/Interspeech.2009-561

@inproceedings{mcgraw09_interspeech,
  author={Ian McGraw and Alexander Gruenstein and Andrew Sutherland},
  title={{A self-labeling speech corpus: collecting spoken words with an online educational game}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={3031--3034},
  doi={10.21437/Interspeech.2009-561}
}