This paper describes a method that improves accuracy of N-gram language models which can be applied to on-line applications. The precision of a long-distance language model including LDA is influenced by a context length, or a length of the history used for prediction. In the proposed method, each of multiple LDA units estimates an optimum context length separately, then those predictions are integrated and N-gram probabilities are calculated. The method directly estimates the optimum context length suitable for prediction. Results show the method improves topic-dependent N-gram probabilities, particularly of a word related to specific topics, yielding higher and more stable performance comparing to an existing method.
Bibliographic reference. Nakamura, Akira / Hayamizu, Satoru (2010): "Topic-dependent n-gram models based on optimization of context lengths in LDA", In INTERSPEECH-2010, 3066-3069.