ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Triggering individual word domains in n-gram language models

E. I. Sicilia-Garcia, Ji Ming, F. J. Smith

We present a new method of introducing domain knowledge into an n-gram language model. It is based on a combination of language models for individual word domains. Each word model is built from an individual corpus which is formed by extracting those subsets of the entire training corpus which contain that significant word. When testing, significant words are extracted from a cache and their models are combined with a global language model. Different methods of combining the models are described; one simple method based on combining frequencies rather than probabilities gives promising results and provides a relatively simple method of introducing domain information into an n-gram language model. A 20% reduction in language model perplexity over the standard 3-gram approach is obtained which is similar to results obtained with other more complex domain models. The model also requires a small cache compared with other models requiring a cache.


doi: 10.21437/Eurospeech.2001-212

Cite as: Sicilia-Garcia, E.I., Ming, J., Smith, F.J. (2001) Triggering individual word domains in n-gram language models. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 701-704, doi: 10.21437/Eurospeech.2001-212

@inproceedings{siciliagarcia01_eurospeech,
  author={E. I. Sicilia-Garcia and Ji Ming and F. J. Smith},
  title={{Triggering individual word domains in n-gram language models}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={701--704},
  doi={10.21437/Eurospeech.2001-212}
}