8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Language Model Adaptation Using Cross-Lingual Information

Woosung Kim, Sanjeev Khudanpur

Johns Hopkins University, USA

The success of statistical language modeling techniques is crucially dependent on the availability of a large amount training text. For a language in which such large text collections are not available, methods have recently been proposed to take advantage of a resource-rich language, together with cross-lingual information retrieval and machine translation, to sharpen language models for the resource-deficient language. In this paper, we describe investigations into such language models for an automatic speech recognition system for Mandarin Broadcast News. By exploiting a large side-corpus of contemporaneous English news articles to adapt a static Chinese language model to the news story being transcribed, we demonstrate significant improvements in recognition accuracy. The improvement from using English text is greater when less Chinese text is available to estimate the static language model. We also compare our cross-lingual adaptation to monolingual topic-dependent language model adaptation, and achieve further gains by combining the two adaptation techniques.

Full Paper

Bibliographic reference.  Kim, Woosung / Khudanpur, Sanjeev (2003): "Language model adaptation using cross-lingual information", In EUROSPEECH-2003, 3129-3132.