Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Growing an n-Gram Language Model

Vesa Siivola (1), Bryan L. Pellom (2)

(1) Helsinki University of Technology, Finland; (2) University of Colorado at Boulder, USA

Traditionally, when building an n-gram model, we decide the span of the model history, collect the relevant statistics and estimate the model. The model can be pruned down to a smaller size by manipulating the statistics or the estimated model. This paper shows how an n-gram model can be built by adding suitable sets of n-grams to a unigram model until desired complexity is reached. Very high order n-grams can be used in the model, since the need for handling the full unpruned model is eliminated by the proposed technique. We compare our growing method to entropy based pruning. In Finnish speech recognition tests, the models trained by the growing method outperform the entropy pruned models of similar size.

Full Paper

Bibliographic reference.  Siivola, Vesa / Pellom, Bryan L. (2005): "Growing an n-gram language model", In INTERSPEECH-2005, 1309-1312.