Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Language Model Size Reduction by Pruning and Clustering

Joshua Goodman (1), Jianfeng Gao (2)

(1) Speech Technology Group, Microsoft Research, Redmond, WA, USA
(2) Natural Language Group, Microsoft Research China, Beijing, China

Several techniques are known for reducing the size of language models, including count cutoffs [1], Weighted Difference pruning [2], Stolcke pruning [3], and clustering [4]. We compare all of these techniques and show some surprising results. For instance, at low pruning thresholds, Weighted Difference and Stolcke pruning underperform count cutoffs. We then show novel clustering techniques that can be combined with Stolcke pruning to produce the smallest models at a given perplexity. The resulting models can be a factor of three or more smaller than models pruned with Stolcke pruning, at the same perplexity. The technique creates clustered models that are often larger than the unclustered models, but which can be pruned to models that are smaller than unclustered models with the same perplexity.


  1. F. Jelinek, "Self Organized Language modeling for Speech Recognition", in Readings in Speech Recognition, A. Waibel and K. F. Lee(Eds.), Morgan Kaufmann, 1990
  2. K. Seymore, R. Rosenfeld. "Scalable backoff language models", Proc. ICSLP, Vol. 1., pp.232-235, Philadelphia, 1996
  3. A. Stolcke, "Entropy-based Pruning of Backoff Language Models" Proc. DARPA News Transcription and Understanding Workshop, 1998, pp. 270-274, Lansdowne, VA.
  4. P. F. Brown,V. J. DellaPietra, P. V. deSouza, J. C., Lai, R. L. Mercer. "Class-based n-gram models of natural language". Computational Linguistics 1990 (18), 467-479.

Full Paper

Bibliographic reference.  Goodman, Joshua / Gao, Jianfeng (2000): "Language model size reduction by pruning and clustering", In ICSLP-2000, vol.3, 110-113.