ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Particle-based language modelling

E. W. D. Whittaker, P. C. Woodland

This paper investigates the use of particle (sub-word) N-grams for language modelling. One linguistics-based and two datadriven algorithms are presented and evaluated in terms of perplexity for Russian and English. Interpolating word trigram and particle 6-gram models gives up to a 7.5% perplexity reduction over the baseline word trigram model for Russian. Lattice rescoring experiments are also performed on 1997 DARPA Hub4 evaluation lattices where the interpolated model gives a 0.4% absolute reduction in word error rate over the baseline word trigram model.


Cite as: Whittaker, E.W.D., Woodland, P.C. (2000) Particle-based language modelling. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 170-173

@inproceedings{whittaker00_icslp,
  author={E. W. D. Whittaker and P. C. Woodland},
  title={{Particle-based language modelling}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 170-173}
}