INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Using Latent Dirichlet Allocation to Incorporate Domain Knowledge for Topic Transition Detection

Xiaodan Zhu, Xuming He, Cosmin Munteanu, Gerald Penn

University of Toronto, Canada

This paper studies automatic detection of topic transitions for recorded presentations. This can be achieved by matching slide content with presentation transcripts directly with some similarity metrics. Such literal matching, however, misses domain-specific knowledge and is sensitive to speech recognition errors. In this paper, we incorporate relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words. To this end, we train latent Dirichlet allocation (LDA) models on these materials and measure the similarity between slides and transcripts in the acquired hidden-topic space. This similarity is then combined with literal matchings. Experiments show that the proposed approach reduces the errors in slide transition detection by 17.41% on manual transcripts and 27.37% on automatic transcripts.

Full Paper

Bibliographic reference.  Zhu, Xiaodan / He, Xuming / Munteanu, Cosmin / Penn, Gerald (2008): "Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection", In INTERSPEECH-2008, 2443-2445.