10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Topic Dependent Language Model Based on Topic Voting on Noun History

Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa

Toyohashi University of Technology, Japan

Language models (LMs) are important in automatic speech recognition systems. In this paper, we propose a new approach to a topic dependent LM, where the topic is decided in an unsupervised manner. Latent Semantic Analysis (LSA) is employed to reveal hidden (latent) relations among nouns in the context words. To decide the topic of an event, a fixed size word history sequence (window) is observed, and voting is then carried out based on noun class occurrences weighted by a confidence measure. Experiments on the Wall Street Journal corpus and Mainichi Shimbun (Japanese newspaper) corpus show that our proposed method gives better perplexity than the comparative baselines, including a word-based/class-based n-gram LM, their interpolated LM, a cache-based LM, and the Latent Dirichlet Allocation (LDA)-based topic dependent LM.

Full Paper

Bibliographic reference.  Naptali, Welly / Tsuchiya, Masatoshi / Nakagawa, Seiichi (2009): "Topic dependent language model based on topic voting on noun history", In INTERSPEECH-2009, 2683-2686.