Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Efficient Training Methods for Maximum Entropy Language Modeling

Jun Wu, Sanjeev Khudanpur

Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA

Maximum entropy language modeling techniques combine different sources of statistical dependence, such as syntactic relationships, topic cohesiveness and collocation frequency, in a unified and e ective language model. These techniques however are also computationally very intensive, particularly during model estimation, compared to the more prevalent alternative of interpolating several simple models, each capturing one type of dependency. In this paper we present ways which significantly reduce this complexity by reorganizing the required computations. We show that in case of a model with N-gram constraints, each iteration of the parameter estimation algorithm requires the same amount of computation as estimating a comparable back-off N-gram model. In general, the computational cost of each iteration in model estimation is linear in the number of distinct "histories" seen in the training corpus, times a model-class dependent factor. The reorganization focuses mainly on reducing this multiplicative factor from the size of the vocabulary to the average number of words seen following a history. A 15-fold speed-up has been observed by using this method in estimating a language model that incorporates syntactic head-word constraints, nonterminal-label constraints and topic-unigram constraints with N-grams for the Switchboard corpus. This model achieves a perplexity reduction of 13% and a word error rate reduction of 1.5% absolute compared to a trigram model.


Full Paper

Bibliographic reference.  Wu, Jun / Khudanpur, Sanjeev (2000): "Efficient training methods for maximum entropy language modeling", In ICSLP-2000, vol.3, 114-118.