11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Topic-Dependent N-Gram Models Based on Optimization of Context Lengths in LDA

Akira Nakamura (1), Satoru Hayamizu (2)

(1) SANYO Electric Co. Ltd., Japan
(2) Gifu University, Japan

This paper describes a method that improves accuracy of N-gram language models which can be applied to on-line applications. The precision of a long-distance language model including LDA is influenced by a context length, or a length of the history used for prediction. In the proposed method, each of multiple LDA units estimates an optimum context length separately, then those predictions are integrated and N-gram probabilities are calculated. The method directly estimates the optimum context length suitable for prediction. Results show the method improves topic-dependent N-gram probabilities, particularly of a word related to specific topics, yielding higher and more stable performance comparing to an existing method.

Full Paper

Bibliographic reference.  Nakamura, Akira / Hayamizu, Satoru (2010): "Topic-dependent n-gram models based on optimization of context lengths in LDA", In INTERSPEECH-2010, 3066-3069.