Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Hierarchical Statistical Language Models: Experiments on In-Domain Adaptation

Lucian Galescu, James Allen

University of Rochester, USA

We introduce a hierarchical statistical language model, represented as a collection of local models plus a general sentence model. We provide an example that mixes a trigram general model and a PFSA local model for the class of decimal numbers, described in terms of sub-word units (graphemes). This model practically extends the vocabulary of the overall model to an infinite size, but still has better performance compared to a word-based model.

Using in-domain language model adaptation experiments, we show that local models can encode enough linguistic information, if well trained, that they may be ported to new language models without re-estimation.


Full Paper

Bibliographic reference.  Galescu, Lucian / Allen, James (2000): "Hierarchical statistical language models: experiments on in-domain adaptation", In ICSLP-2000, vol.1, 186-189.