Typically data collection, transcription, language model generation, and deployment are separate phases of creating a spoken language interface. An unfortunate consequence of this is that the recognizer usually remains a static element of systems often deployed in dynamic environments. By providing an API for human intelligence, Amazon Mechanical Turk changes the way system developers can construct spoken language systems. In this work, we describe an architecture that automates and connects these four phases, effectively allowing the developer to grow a spoken language interface. In particular, we show that a human-in-the-loop programming paradigm, in which workers transcribe utterances behind the scenes, can alleviate the need for expert guidance in language model construction. We demonstrate the utility of these organic language models in a voice-search interface for photographs.
Bibliographic reference. McGraw, Ian / Glass, James / Seneff, Stephanie (2011): "Growing a spoken language interface on Amazon Mechanical Turk", In INTERSPEECH-2011, 3057-3060.