Automatic closed-captioning of video is a useful application of speech recognition technology but poses numerous challenges when applied to open-domain user-uploaded videos such as those on YouTube. In this work, we explore a strategy to improve decoding accuracy for video transcription by decoding each video with a language model (LM) adapted specifically to the topics that the video covers. Taxonomic topic classifiers are used to determine the topic content of videos and to build a large set of topic-specific LMs from web documents. We consider strategies for selecting and interpolating LMs in both supervised and unsupervised scenarios in a two-pass lattice rescoring framework. Experiments on a YouTube video corpus show a 10% relative reduction in WER over generic single-pass transcriptions as well as a statistically significant 2.5% reduction over rescoring with a very large non-adapted LM built from all the documents.
Bibliographic reference. Thadani, Kapil / Biadsy, Fadi / Bikel, Dan (2012): "On-the-fly topic adaptation for YouTube video transcription", In INTERSPEECH-2012, 210-213.