Sixth International Conference on Spoken Language Processing
Language modeling is a key component in speech and handwriting recognition. N-gram language modeling is used as the formalism of choice for a wide range of domains. Although a high order N can reduce perplexity greatly, it is unrealistic in many practical cases to get statistically reliable N-grams. We propose an interpolated model by introducing signal words and clue words into the baseline N-gram model. The initial word in a word pair with high mutual information is chosen as a signal word. In the same way, we define such words that have high mutual information with a certain morphological form as clue words. In a given context, we select a signal word with the highest score to compute the probability of the current word, and a clue word with the highest score to estimate the probability of the form of the current word. We discuss the basic requirements of designing an interpolating language model and see how our models satisfy the requirements. We got considerable reduction in perplexity, compared to the baseline model. Because both signal words and clue words are easy to collect and handle, the proposed method is practical.
Bibliographic reference. Liu, Xiaohu / O'Shaughnessy, Douglas (2000): "Practical language modeling: an interpolating method", In ICSLP-2000, vol.3, 354-357.