ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Assessment of smoothing methods and complex stochastic language modeling

Sven Martin, Christoph Hamacher, Jorg Liermann, Frank Wessel, Hermann Ney

This paper studies the overall effect of language modeling on perplexity and word error rate, starting from a trigram model with a standard smoothing method up to complex state-of-the-art language models: (1) We compare different smoothing methods, namely linear vs. absolute discounting, interpolation vs. backing-off, and back-off functions based on relative frequencies vs. singleton events. (2) We show the effect of complex language model techniques by using distant-trigrams and automatically selected word classes and word phrases using a maximum likelihood criterion (i.e. minimum perplexity). (3) We show the overall gain of the combined application of the above techniques, as opposed to their separate assessment in past publications. (4) We give perplexity and word error rate results on the North American Business corpus (NAB) with a training text of about 240 million words and on the German Verbmobil corpus.


doi: 10.21437/Eurospeech.1999-426

Cite as: Martin, S., Hamacher, C., Liermann, J., Wessel, F., Ney, H. (1999) Assessment of smoothing methods and complex stochastic language modeling. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1939-1942, doi: 10.21437/Eurospeech.1999-426

@inproceedings{martin99b_eurospeech,
  author={Sven Martin and Christoph Hamacher and Jorg Liermann and Frank Wessel and Hermann Ney},
  title={{Assessment of smoothing methods and complex stochastic language modeling}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={1939--1942},
  doi={10.21437/Eurospeech.1999-426}
}