The goal of statistical language modeling is to find probability estimates for arbitrary word sequences. To obtain non-zero values, the probability distributions found in the training data need to be smoothed. In the widely-used Kneser-Ney family of smoothing algorithms, this is achieved by absolute discounting. The discount parameters can be computed directly using some approximation formulas minimizing the leaving-one-out log-likelihood of the training data.
In this work, we outline several shortcomings of the standard estimators for the discount parameters. We propose an efficient method for computing the discount values on held-out data and analyze the resulting parameter estimates. Experiments on large English and French corpora show consistent improvements in perplexity and word error rate over the baseline method. At the same time, this approach can be used for language model pruning, leading to slightly better results than standard pruning algorithms.
Bibliographic reference. Sundermeyer, Martin / Schlüter, Ralf / Ney, Hermann (2011): "On the estimation of discount parameters for language model smoothing", In INTERSPEECH-2011, 1433-1436.