September 22-25, 1997
Over the last few years, some alternatives to N-gram language models, which are based on stochastic regular grammars, have been proposed. These grammars are estimated from data through Grammatical Inference algorithms. In particular, the Morphic Generator Grammatical Inference (MGGI) methodology has been applied to tasks of written natural language queries to databases. As for N-gram models, language models obtained through this methodology require the use of smoothing techniques. This work incorporates a version of the well-known Back-Off smoothing method to the MGGI language models to solve the estimation problem of unseen events in the training corpus, and shows the behaviour of the smoothed MGGI models in two tasks of written sentences. The results illustrate that the smoothed MGGI model works better than the standard smoothed bigram model.
Bibliographic reference. Segarra, E. / Hurtado, L. (1997): "Construction of language models using the morphic generator grammatical inference (MGGI) methodology", In EUROSPEECH-1997, 2695-2698.