Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

A Language Model Combining N-grams and Stochastic Finite State Automata

Alexis Nasr, Yannick Estéve, Frédéric Béchet, Thierry Spriet, Renato de Mori

LIA University of Avignon,, France

This paper describes a new kind of language models composed of several local models and a general model linking the local models together. Local models describe more finely subparts of the textual data than a conventional n-gram trained on the complete corpus. They are built on lexical and syntactic criteria. Both local and global models are integrated in a single hidden Markov model. Experiments showed a 14% decrease in perplexity compared to a bigram model on a small corpus of telephonic communications.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Nasr, Alexis / Estéve, Yannick / Béchet, Frédéric / Spriet, Thierry / Mori, Renato de (1999): "A language model combining n-grams and stochastic finite state automata", In EUROSPEECH'99, 2175-2178.