5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Fast Computation of Maximum Entropy / Minimum Divergence Feature Gain

Harry Printz

IBM, USA

Maximum entropy / minimum divergence modeling is a powerful technique for constructing probability models, which has been applied to a variety of problems in natural language processing. A maximum entropy / minimum divergence (MEMD) model is built from a base model, and a set of feature functions, also called simply features, whose empirical expectations on some training corpus are known. A fundamental difficulty with this technique is that while there are typically millions of features that could be incorporated into a given model, in general it is not computationally feasible, or even desirable, to use them all. Thus some means must be devised for determining each feature's predictive power, also known as its gain. Once the gains are known, the features can be ranked according to their utility, and only the most gainful ones retained. This paper presents a new algorithm for computing feature gain that is fast, accurate and memory-efficient.

Full Paper

Bibliographic reference.  Printz, Harry (1998): "Fast computation of maximum entropy / minimum divergence feature gain", In ICSLP-1998, paper 1129.