ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Statistical methods for topic segmentation

Satya Dharanipragada, Martin Franz, J. Scott McCarley, K. Papineni, Salim Roukos, T. Ward, W.-J. Zhu

Automatic Topic Segmentation is an important technology for multimedia archival and retrieval systems. In this paper we present an algorithm for topic segmentation which uses a combination of machine learning, statistical natural lan- guage processing, and information retrieval techniques. The performance of this algorithm is measured by considering the misses and false alarms on a manually segmented corpus. We present our results on the widely used TDT2 and TDT3 cor- pora provided by NIST. Most of the techniques described are independent of the source language. We demonstrate this by applying the algorithm on both the English and Mandarin TDT3 corpora with only minor changes.

Cite as: Dharanipragada, S., Franz, M., McCarley, J.S., Papineni, K., Roukos, S., Ward, T., Zhu, W.-J. (2000) Statistical methods for topic segmentation. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 516-519

  author={Satya Dharanipragada and Martin Franz and J. Scott McCarley and K. Papineni and Salim Roukos and T. Ward and W.-J. Zhu},
  title={{Statistical methods for topic segmentation}},
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 516-519}