Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Statistical Methods for Topic Segmentation

Satya Dharanipragada, Martin Franz, J. Scott McCarley, K. Papineni, Salim Roukos, T. Ward, W.-J. Zhu

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

Automatic Topic Segmentation is an important technology for multimedia archival and retrieval systems. In this paper we present an algorithm for topic segmentation which uses a combination of machine learning, statistical natural lan- guage processing, and information retrieval techniques. The performance of this algorithm is measured by considering the misses and false alarms on a manually segmented corpus. We present our results on the widely used TDT2 and TDT3 cor- pora provided by NIST. Most of the techniques described are independent of the source language. We demonstrate this by applying the algorithm on both the English and Mandarin TDT3 corpora with only minor changes.


Full Paper

Bibliographic reference.  Dharanipragada, Satya / Franz, Martin / McCarley, J. Scott / Papineni, K. / Roukos, Salim / Ward, T. / Zhu, W.-J. (2000): "Statistical methods for topic segmentation", In ICSLP-2000, vol.1, 516-519.