We introduce a number of techniques designed to help integrate semantic knowledge with N-gram language models for automatic speech recognition. Our techniques allow us to integrate Latent Semantic Analysis (LSA), a word-similarity algorithm based on word co-occurrence information, with N-gram models. While LSA is good at predicting content words which are coherent with the rest of a text, it is a bad predictor of frequent words, has a low dynamic range, and is inaccurate when combined linearly with N-grams. We show that modifying the dynamic range, applying a per-word confidence metric, and using geometric rather than linear combinations with N-grams produces a more robust language model which has a lower perplexity on a Wall Street Journal test-set than a baseline N-gram model.
Cite as: Coccaro, N., Jurafsky, D. (1998) Towards better integration of semantic predictors in statistical language modeling. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0852, doi: 10.21437/ICSLP.1998-642
@inproceedings{coccaro98_icslp, author={Noah Coccaro and Daniel Jurafsky}, title={{Towards better integration of semantic predictors in statistical language modeling}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0852}, doi={10.21437/ICSLP.1998-642} }