This paper studies automatic detection of topic transitions for recorded presentations. This can be achieved by matching slide content with presentation transcripts directly with some similarity metrics. Such literal matching, however, misses domain-specific knowledge and is sensitive to speech recognition errors. In this paper, we incorporate relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words. To this end, we train latent Dirichlet allocation (LDA) models on these materials and measure the similarity between slides and transcripts in the acquired hidden-topic space. This similarity is then combined with literal matchings. Experiments show that the proposed approach reduces the errors in slide transition detection by 17.41% on manual transcripts and 27.37% on automatic transcripts.
Bibliographic reference. Zhu, Xiaodan / He, Xuming / Munteanu, Cosmin / Penn, Gerald (2008): "Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection", In INTERSPEECH-2008, 2443-2445.