Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Story Segmentation and Topic Detection for Recognized Speech

S. Dharanipragada, Martin Franz, J. S. McCarley, Salim Roukos, T. Ward

IBM T. J. Watson Research Center Yorktown Heights, NY, USA

In this paper we present algorithms for story segmentation, topic detection, and topic tracking. The algorithmsuse a combination of machine learning, statistical naturallanguage processing and information retrieval techniques.The story segmentation algorithm is a two stage algorithm that uses a decision tree based probabilistic modelin the first stage and incorporates aspects of our topicdetection system via an information-retrieval based refinement scheme in the second stage. The topic detectionand tracking algorithm is an incremental clustering algorithm that employs a novel dynamic cluster-dependentsimilarity measure between documents and clusters. Per-formance of these algorithms are measured on the 1998DARPA sponsored Topic Detection and Tracking Phase2 (TDT2) evaluation task.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Dharanipragada, S. / Franz, Martin / McCarley, J. S. / Roukos, Salim / Ward, T. (1999): "Story segmentation and topic detection for recognized speech", In EUROSPEECH'99, 2435-2438.