Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Particle-Based Language Modelling

E. W. D. Whittaker, P. C. Woodland

Cambridge University Engineering Department, UK

This paper investigates the use of particle (sub-word) N-grams for language modelling. One linguistics-based and two datadriven algorithms are presented and evaluated in terms of perplexity for Russian and English. Interpolating word trigram and particle 6-gram models gives up to a 7.5% perplexity reduction over the baseline word trigram model for Russian. Lattice rescoring experiments are also performed on 1997 DARPA Hub4 evaluation lattices where the interpolated model gives a 0.4% absolute reduction in word error rate over the baseline word trigram model.

Full Paper

Bibliographic reference.  Whittaker, E. W. D. / Woodland, P. C. (2000): "Particle-based language modelling", In ICSLP-2000, vol.1, 170-173.