ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Language model size reduction by pruning and clustering

Joshua Goodman, Jianfeng Gao

Several techniques are known for reducing the size of language models, including count cutoffs [1], Weighted Difference pruning [2], Stolcke pruning [3], and clustering [4]. We compare all of these techniques and show some surprising results. For instance, at low pruning thresholds, Weighted Difference and Stolcke pruning underperform count cutoffs. We then show novel clustering techniques that can be combined with Stolcke pruning to produce the smallest models at a given perplexity. The resulting models can be a factor of three or more smaller than models pruned with Stolcke pruning, at the same perplexity. The technique creates clustered models that are often larger than the unclustered models, but which can be pruned to models that are smaller than unclustered models with the same perplexity.

s F. Jelinek, "Self Organized Language modeling for Speech Recognition", in Readings in Speech Recognition, A. Waibel and K. F. Lee(Eds.), Morgan Kaufmann, 1990 K. Seymore, R. Rosenfeld. "Scalable backoff language models", Proc. ICSLP, Vol. 1., pp.232-235, Philadelphia, 1996 A. Stolcke, "Entropy-based Pruning of Backoff Language Models" Proc. DARPA News Transcription and Understanding Workshop, 1998, pp. 270-274, Lansdowne, VA. P. F. Brown,V. J. DellaPietra, P. V. deSouza, J. C., Lai, R. L. Mercer. "Class-based n-gram models of natural language". Computational Linguistics 1990 (18), 467-479.

Cite as: Goodman, J., Gao, J. (2000) Language model size reduction by pruning and clustering. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 110-113

  author={Joshua Goodman and Jianfeng Gao},
  title={{Language model size reduction by pruning and clustering}},
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 110-113}