
Sixth International Conference on Spoken Language Processing
(ICSLP 2000)
Beijing, China
October 1620, 2000 

Statistical Language Modeling with a Class Based NMultigram Model
Sabine Deligne
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
In this paper, we report on speech recognition experiments
with an nmultigram language model, a stochastic
model which assumes dependencies of length n between
variablelength phrases. The nmultigram probabilities
can be estimated in a classbased framework, where both
the phrase distribution and the phrase classes are learned
from the data according to a Maximum Likelihood criterion,
using a generalized ExpectationMaximization
algorithm. In our speech recognition experiments on a
database of air travel reservations, the 2multigram model
allows a reduction of 19% of the word error rate with
respect to the usual trigram model, with 25% fewer param
eters than in the trigram model. We also report on a
scheme where some a priori information is introduced in
the model via semantic tagging.
Full Paper
Bibliographic reference.
Deligne, Sabine (2000):
"Statistical language modeling with a class based nmultigram model",
In ICSLP2000, vol.3, 119122.