8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Using Untranscribed User Utterances for Improving Language Models Based on Confidence Scoring

Mikio Nakano (1), Timothy J. Hazen (2)

(1) NTT Corporation, Japan
(2) Massachusetts Institute of Technology, USA

This paper presents a method for reducing the effort of transcribing user utterances to develop language models for conversational speech recognition when a small number of transcribed and a large number of untranscribed utterances are available. The recognition hypotheses for untranscribed utterances are classified according to their confidence scores such that hypotheses with high confidence are used to enhance language model training. The utterances that receive low confidence can be scheduled to be manually transcribed first to improve the language model. The results of experiments using automatic transcription of the untranscribed user utterances show the proposed methods are effective in achieving improvements in recognition accuracy while reducing the effort required from manual transcription.

Full Paper

Bibliographic reference.  Nakano, Mikio / Hazen, Timothy J. (2003): "Using untranscribed user utterances for improving language models based on confidence scoring", In EUROSPEECH-2003, 417-420.