 |
Sixth International Conference on Spoken Language Processing
(ICSLP 2000)
Beijing, China
October 16-20, 2000 |
 |
Statistical Methods for Topic Segmentation
Satya Dharanipragada, Martin Franz, J. Scott McCarley, K. Papineni, Salim Roukos, T. Ward, W.-J. Zhu
IBM T.J. Watson Research Center,
Yorktown Heights, NY, USA
Automatic Topic Segmentation is an important technology
for multimedia archival and retrieval systems. In this paper
we present an algorithm for topic segmentation which uses
a combination of machine learning, statistical natural lan-
guage processing, and information retrieval techniques. The
performance of this algorithm is measured by considering the
misses and false alarms on a manually segmented corpus. We
present our results on the widely used TDT2 and TDT3 cor-
pora provided by NIST. Most of the techniques described are
independent of the source language. We demonstrate this by
applying the algorithm on both the English and Mandarin
TDT3 corpora with only minor changes.
Full Paper
Bibliographic reference.
Dharanipragada, Satya / Franz, Martin / McCarley, J. Scott / Papineni, K. / Roukos, Salim / Ward, T. / Zhu, W.-J. (2000):
"Statistical methods for topic segmentation",
In ICSLP-2000, vol.1, 516-519.