We explore a new approach to collecting and transcribing speech data by using online educational games. One such game, Voice Race, elicited over 55,000 utterances over a 22 day period, representing 18.7 hours of speech. Voice Race was designed such that the transcripts for a significant subset of utterances can be automatically inferred using the contextual constraints of the game. Game context can also be used to simplify transcription to a multiple choice task, which can be performed by non-experts. We found that one third of the speech collected with Voice Race could be automatically transcribed with over 98% accuracy; and that an additional 49% could be labeled cheaply by Amazon Mechanical Turk workers. We demonstrate the utility of the self-labeled speech in an acoustic model adaptation task, which resulted in a reduction in the Voice Race utterance error rate. The collected utterances cover a wide variety of vocabulary, and should be useful across a range of research.
Bibliographic reference. McGraw, Ian / Gruenstein, Alexander / Sutherland, Andrew (2009): "A self-labeling speech corpus: collecting spoken words with an online educational game", In INTERSPEECH-2009, 3031-3034.