ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Smoothing issues in the structured language model

Woosung Kim, Sanjeev Khudanpur, Jun Wu

The Structured Language Model (SLM) recently introduced by Chelba and Jelinek is a powerful general formalism for exploiting syntactic dependencies in a left-to-right language model for applications such as speech and handwriting recognition, spelling correction, machine translation, etc. Unlike traditional N-gram models, optimal smoothing techniques -- discounting methods and hierarchical structures for back-off -- are still being developed for the SLM. In the SLM, the statistical dependencies of a word on immediately preceding words, preceding syntactic heads, non-terminal labels, etc., are parameterized as overlapping N-gram dependencies. Statistical dependencies in the parser and tagger used by the SLM also have N-gram like structure. Deleted interpolation has been used to combine these N-gram like models. We demonstrate on two different corpora -- WSJ and Switchboard -- that more recent modified back-off strategies and nonlinear interpolation methods considerably lower the perplexity of the SLM. Improvement in word error rate is also demonstrated on the Switchboard corpus.

doi: 10.21437/Eurospeech.2001-216

Cite as: Kim, W., Khudanpur, S., Wu, J. (2001) Smoothing issues in the structured language model. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 717-720, doi: 10.21437/Eurospeech.2001-216

  author={Woosung Kim and Sanjeev Khudanpur and Jun Wu},
  title={{Smoothing issues in the structured language model}},
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},