Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Combining Nonlocal, Syntactic and N-Gram Dependencies in Language Modeling

Jun Wu, Sanjeev Khudanpur

Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA

A new language model is presented which incorporates local N-gram dependencies with two important sources of long-range dependencies: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy method. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N-gram range due to an intervening phrase or clause. It is also shown that these two methods individually enhance an N-gram model in complementary ways and the overall improvement from their combination is nearly additive.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Wu, Jun / Khudanpur, Sanjeev (1999): "Combining nonlocal, syntactic and n-gram dependencies in language modeling", In EUROSPEECH'99, 2179-2182.