8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Automatic Induction of N-Gram Language Models from a Natural Language Grammar

Stephanie Seneff, Chao Wang, Timothy J. Hazen

Massachusetts Institute of Technology, USA

This paper details our work in developing a technique which can automatically generate class n-gram language models from natural language (NL) grammars in dialogue systems. The procedure eliminates the need for double maintenance of the recognizer language model and NL grammar. The resulting language model adopts the standard class n-gram framework for computational efficiency. Moreover, both the n-gram classes and training sentences are enhanced with semantic/syntactic tags defined in the NL grammar, such that the trained language model preserves the distinctive statistics associated with different word senses. We have applied this approach in several different domains and languages, and have evaluated it on our most mature dialogue systems to assess its competitiveness with pre-existing n-gram language models. The speech recognition performances with the new language model are in fact the best we have achieved in both the JUPITER weather domain and the MERCURY flight reservation domain.

Full Paper

Bibliographic reference.  Seneff, Stephanie / Wang, Chao / Hazen, Timothy J. (2003): "Automatic induction of n-gram language models from a natural language grammar", In EUROSPEECH-2003, 641-644.