ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Pruning sparse non-negative matrix n-gram language models

Joris Pelemans, Noam Shazeer, Ciprian Chelba

In this paper we present a pruning algorithm and experimental results for our recently proposed Sparse Non-negative Matrix (SNM) family of language models (LMs). We show that when trained with only n-gram features SNMLM pruning based on a mutual information criterion yields the best known pruned model on the One Billion Word Language Model Benchmark, reducing perplexity with 18% and 57% over Katz and Kneser-Ney LMs, respectively. We also present a method for converting an SNMLM to ARPA back-off format which can be readily used in a single-pass decoder for Automatic Speech Recognition.


doi: 10.21437/Interspeech.2015-343

Cite as: Pelemans, J., Shazeer, N., Chelba, C. (2015) Pruning sparse non-negative matrix n-gram language models. Proc. Interspeech 2015, 1433-1437, doi: 10.21437/Interspeech.2015-343

@inproceedings{pelemans15_interspeech,
  author={Joris Pelemans and Noam Shazeer and Ciprian Chelba},
  title={{Pruning sparse non-negative matrix n-gram language models}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={1433--1437},
  doi={10.21437/Interspeech.2015-343}
}