September 22-25, 1997
This paper presents two extensions of the standard interpolated word trigram and cache model, namely the extension of the trigram model by useful word m-grams with m > 3 resulting into a varigram model, and the addition of topic-specific trigram models. We give the criteria for selecting useful m-grams and for partitioning the training corpus into topic-specific subcorpora. We apply both extensions, separately and in combination, to corpora of 4 and 39 million words taken from the Wall Street Journal Corpus and show that high reductions in perplexity of up to 19 % on the largest corpus are achieved. We also performed some recognition experiments.
Bibliographic reference. Martin, Sven C. / Liermann, Jörg / Ney, Hermann (1997): "Adaptive topic - dependent language modelling using word - based varigrams", In EUROSPEECH-1997, 1447-1450.