12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Empirical Evaluation and Combination of Advanced Language Modeling Techniques

Tomáš Mikolov (1), Anoop Deoras (2), Stefan Kombrink (1), Lukáš Burget (1), Jan Černocký (1)

(1) Brno University of Technology, Czech Republic
(2) Johns Hopkins University, USA

We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state of the art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%.

Full Paper

Bibliographic reference.  Mikolov, Tomáš / Deoras, Anoop / Kombrink, Stefan / Burget, Lukáš / Černocký, Jan (2011): "Empirical evaluation and combination of advanced language modeling techniques", In INTERSPEECH-2011, 605-608.