Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Practical Language Modeling: An Interpolating Method

Xiaohu Liu, Douglas O'Shaughnessy

INRS-Telecommunications, University of Quebec, Canada

Language modeling is a key component in speech and handwriting recognition. N-gram language modeling is used as the formalism of choice for a wide range of domains. Although a high order N can reduce perplexity greatly, it is unrealistic in many practical cases to get statistically reliable N-grams. We propose an interpolated model by introducing signal words and clue words into the baseline N-gram model. The initial word in a word pair with high mutual information is chosen as a signal word. In the same way, we define such words that have high mutual information with a certain morphological form as clue words. In a given context, we select a signal word with the highest score to compute the probability of the current word, and a clue word with the highest score to estimate the probability of the form of the current word. We discuss the basic requirements of designing an interpolating language model and see how our models satisfy the requirements. We got considerable reduction in perplexity, compared to the baseline model. Because both signal words and clue words are easy to collect and handle, the proposed method is practical.

Full Paper

Bibliographic reference.  Liu, Xiaohu / O'Shaughnessy, Douglas (2000): "Practical language modeling: an interpolating method", In ICSLP-2000, vol.3, 354-357.