EUROSPEECH 2003 - INTERSPEECH 2003
This paper presents a method for reducing the effort of transcribing user utterances to develop language models for conversational speech recognition when a small number of transcribed and a large number of untranscribed utterances are available. The recognition hypotheses for untranscribed utterances are classified according to their confidence scores such that hypotheses with high confidence are used to enhance language model training. The utterances that receive low confidence can be scheduled to be manually transcribed first to improve the language model. The results of experiments using automatic transcription of the untranscribed user utterances show the proposed methods are effective in achieving improvements in recognition accuracy while reducing the effort required from manual transcription.
Bibliographic reference. Nakano, Mikio / Hazen, Timothy J. (2003): "Using untranscribed user utterances for improving language models based on confidence scoring", In EUROSPEECH-2003, 417-420.