ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Towards better integration of semantic predictors in statistical language modeling

Noah Coccaro, Daniel Jurafsky

We introduce a number of techniques designed to help integrate semantic knowledge with N-gram language models for automatic speech recognition. Our techniques allow us to integrate Latent Semantic Analysis (LSA), a word-similarity algorithm based on word co-occurrence information, with N-gram models. While LSA is good at predicting content words which are coherent with the rest of a text, it is a bad predictor of frequent words, has a low dynamic range, and is inaccurate when combined linearly with N-grams. We show that modifying the dynamic range, applying a per-word confidence metric, and using geometric rather than linear combinations with N-grams produces a more robust language model which has a lower perplexity on a Wall Street Journal test-set than a baseline N-gram model.


doi: 10.21437/ICSLP.1998-642

Cite as: Coccaro, N., Jurafsky, D. (1998) Towards better integration of semantic predictors in statistical language modeling. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0852, doi: 10.21437/ICSLP.1998-642

@inproceedings{coccaro98_icslp,
  author={Noah Coccaro and Daniel Jurafsky},
  title={{Towards better integration of semantic predictors in statistical language modeling}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0852},
  doi={10.21437/ICSLP.1998-642}
}