This paper describes a new kind of language models composed of several local models and a general model linking the local models together. Local models describe more finely subparts of the textual data than a conventional n-gram trained on the complete corpus. They are built on lexical and syntactic criteria. Both local and global models are integrated in a single hidden Markov model. Experiments showed a 14% decrease in perplexity compared to a bigram model on a small corpus of telephonic communications.
Cite as: Nasr, A., Estéve, Y., Béchet, F., Spriet, T., Mori, R.d. (1999) A language model combining n-grams and stochastic finite state automata. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2175-2178, doi: 10.21437/Eurospeech.1999-481
@inproceedings{nasr99_eurospeech, author={Alexis Nasr and Yannick Estéve and Frédéric Béchet and Thierry Spriet and Renato de Mori}, title={{A language model combining n-grams and stochastic finite state automata}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2175--2178}, doi={10.21437/Eurospeech.1999-481} }