This paper addresses the adjustment of the language model (LM) scaling factor of an automatic speech recognition (ASR) system for a new domain using only un-transcribed speech. The main idea is to replace the (unavailable) reference transcript with an automatic transcript generated by an independent ASR system, and adjust parameters using this sloppy reference. It is shown that despite its fairly high error rate (ca. 35%), choosing the scaling factor to minimize disagreement with the erroneous transcripts is still an effective recipe for model selection. This effectiveness is demonstrated by adjusting an ASR system trained on Broadcast News to transcribe the MIT Lectures corpus. An ASR system for telephone speech produces the sloppy reference, and optimizing towards it yields a nearly optimal LM scaling factor for the MIT Lectures corpus.
Bibliographic reference. White, Christopher M. / Rastrow, Ariya / Khudanpur, Sanjeev / Jelinek, Frederick (2009): "Unsupervised estimation of the language model scaling factor", In INTERSPEECH-2009, 1195-1198.