This paper investigates the use of particle (sub-word) N-grams for language modelling. One linguistics-based and two datadriven algorithms are presented and evaluated in terms of perplexity for Russian and English. Interpolating word trigram and particle 6-gram models gives up to a 7.5% perplexity reduction over the baseline word trigram model for Russian. Lattice rescoring experiments are also performed on 1997 DARPA Hub4 evaluation lattices where the interpolated model gives a 0.4% absolute reduction in word error rate over the baseline word trigram model.
Cite as: Whittaker, E.W.D., Woodland, P.C. (2000) Particle-based language modelling. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 170-173
@inproceedings{whittaker00_icslp, author={E. W. D. Whittaker and P. C. Woodland}, title={{Particle-based language modelling}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 170-173} }