ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Topic dependent language model based on topic voting on noun history

Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa

Language models (LMs) are important in automatic speech recognition systems. In this paper, we propose a new approach to a topic dependent LM, where the topic is decided in an unsupervised manner. Latent Semantic Analysis (LSA) is employed to reveal hidden (latent) relations among nouns in the context words. To decide the topic of an event, a fixed size word history sequence (window) is observed, and voting is then carried out based on noun class occurrences weighted by a confidence measure. Experiments on the Wall Street Journal corpus and Mainichi Shimbun (Japanese newspaper) corpus show that our proposed method gives better perplexity than the comparative baselines, including a word-based/class-based n-gram LM, their interpolated LM, a cache-based LM, and the Latent Dirichlet Allocation (LDA)-based topic dependent LM.


doi: 10.21437/Interspeech.2009-124

Cite as: Naptali, W., Tsuchiya, M., Nakagawa, S. (2009) Topic dependent language model based on topic voting on noun history. Proc. Interspeech 2009, 2683-2686, doi: 10.21437/Interspeech.2009-124

@inproceedings{naptali09_interspeech,
  author={Welly Naptali and Masatoshi Tsuchiya and Seiichi Nakagawa},
  title={{Topic dependent language model based on topic voting on noun history}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2683--2686},
  doi={10.21437/Interspeech.2009-124}
}