ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Statistical language modeling with a class based n-multigram model

Sabine Deligne

In this paper, we report on speech recognition experiments with an n-multigram language model, a stochastic model which assumes dependencies of length n between variable-length phrases. The n-multigram probabilities can be estimated in a class-based framework, where both the phrase distribution and the phrase classes are learned from the data according to a Maximum Likelihood criterion, using a generalized Expectation-Maximization algorithm. In our speech recognition experiments on a database of air travel reservations, the 2-multigram model allows a reduction of 19% of the word error rate with respect to the usual trigram model, with 25% fewer param eters than in the trigram model. We also report on a scheme where some a priori information is introduced in the model via semantic tagging.


Cite as: Deligne, S. (2000) Statistical language modeling with a class based n-multigram model. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 119-122

@inproceedings{deligne00_icslp,
  author={Sabine Deligne},
  title={{Statistical language modeling with a class based n-multigram model}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 119-122}
}