7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Using Cross-Language Cues for Story-Specific Language Modeling

Sanjeev Khudanpur, Woosung Kim

Johns Hopkins University, USA

We propose methods to exploit contemporary news articles in a resource rich language, together with cross-language information retrieval and machine translation, to sharpen language models for a news story in a language with fewer linguistic resources. We report experimental results on story-specific Chinese language models that use cues from a parallel corpus of English news stories. We demonstrate that even with fairly crude cross-language information retrieval, level-1 machine translation and simple linear interpolation, a significant (18%) reduction in perplexity may be obtained over a Chinese trigram model. We also demonstrate that this method of sharpening the Chinese language model is complementary to other techniques like topic dependent modeling, and the two in combination result in an even greater reduction in perplexity (28%).


Full Paper

Bibliographic reference.  Khudanpur, Sanjeev / Kim, Woosung (2002): "Using cross-language cues for story-specific language modeling", In ICSLP-2002, 513-516.