Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Putting Language Into Language Modeling

Frederick Jelinek, Ciprian Chelba

Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, USA

In this paper we describe the statistical Structured Language Model (SLM) that uses grammatical analysis of the hypothesized sentence segment (prefix) to predict the next word. We first describe the operation of a basic, completely lexicalized SLM that builds up partial parses as it proceeds left to right. We then develop a chart parsing algorithm and with its help a method to compute the prediction probabilities P(wi+1jWi): We suggest useful computational shortcuts followed by a method of training SLM parameters from text data. Finally, we introduce more detailed parametrization that involves non-terminal labeling and considerably improves smoothing of SLM statistical parameters. We conclude by presenting certain recognition and perplexity results achieved on standard corpora.

Full Paper (PDF)

Bibliographic reference.  Jelinek, Frederick / Chelba, Ciprian (1999): "Putting language into language modeling", In EUROSPEECH'99, Keynote Paper 1.