This paper describes a method for detecting impossible bigrams from a space of V2 bigrams where V is the size of the vocabulary. The idea is to discard all the ungrammatical events which are impossible in a well written text and consequently to expect an improvement of the language model. We expect also, in speech recognition, to reduce the complexity of the search algorithm by making less comparisons. To achieve that, we extract the impossible bigrams by using automatic rules. These rules are based on grammatical classes. The biclass associations which are ungrammatical are detected and all the corresponding bigrams are analyzed and set as possible or impossible events. As, in natural language, grammatical rules can have exceptions, we decided to manage for each of the retrieved rules an exception list.
Cite as: Brun, A., Langlois, D., Smaili, K., Haton, J.-P. (2000) Discarding impossible events from statistical language models. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 981-984, doi: 10.21437/ICSLP.2000-699
@inproceedings{brun00_icslp, author={Armelle Brun and David Langlois and Kamel Smaili and Jean-Paul Haton}, title={{Discarding impossible events from statistical language models}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 981-984}, doi={10.21437/ICSLP.2000-699} }