September 22-25, 1997
This paper proposes a novel variable-length class- based language model that integrates local and global constraints. In this model, the classes are iteratively recreated by grouping consecutive words and by splitting initial part-of speech (POS) clusters into finer clusters (word-classes). The main characteristic of this modeling is that these operations of grouping and splitting is carried out selectively, taking into account global constraints between noncontiguous words on the basis of a minimum entropy criterion. To capture the global constraints, the model takes into account the sequences of the function words and of the content words, which are expected to respectively represent the syntactic and semantic relationships between words. Experiments showed that the perplexity of the proposed model for the test corpus is lower than that of conventional models and that this model requires a small number of statistical parameters, showing the model's effectiveness.
Bibliographic reference. Matsunaga, Shoichi / Sagayama, Shigeki (1997): "Variable-length language modeling integrating global constraints", In EUROSPEECH-1997, 2719-2722.