ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Putting language into language modeling

Frederick Jelinek, Ciprian Chelba

In this paper we describe the statistical Structured Language Model (SLM) that uses grammatical analysis of the hypothesized sentence segment (prefix) to predict the next word. We first describe the operation of a basic, completely lexicalized SLM that builds up partial parses as it proceeds left to right. We then develop a chart parsing algorithm and with its help a method to compute the prediction probabilities P(wi+1jWi): We suggest useful computational shortcuts followed by a method of training SLM parameters from text data. Finally, we introduce more detailed parametrization that involves non-terminal labeling and considerably improves smoothing of SLM statistical parameters. We conclude by presenting certain recognition and perplexity results achieved on standard corpora.

doi: 10.21437/Eurospeech.1999-1

Cite as: Jelinek, F., Chelba, C. (1999) Putting language into language modeling. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), keynote paper 1, doi: 10.21437/Eurospeech.1999-1

  author={Frederick Jelinek and Ciprian Chelba},
  title={{Putting language into language modeling}},
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={keynote paper 1},