8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Unsupervised Topic Discovery Applied to Segmentation of News Transcriptions

Sreenivasa Sista, Amit Srivastava, Francis Kubala, Richard Schwartz

BBN Technologies, USA

Audio transcriptions from Automatic Speech Recognition systems are a continuous stream of words that are difficult to read. Segmenting these transcriptions into thematically distinct stories and categorizing the stories by topics increases readability and comprehensibility. However, manually defined topic categories are rarely available, and the cost of annotating a large corpus with thousands of distinct topics is high. We describe a procedure for applying the Unsupervised Topic Discovery (UTD) algorithm to the Thematic Story Segmentation procedure for segmenting broadcast news episodes into stories and to assign these stories with automatic topic labels. We report our results of applying automatic topics for the task of story segmentation on a collection of news episodes in English and Arabic. Our results indicate that story segmentation performance with automatic topic annotations from UTD is at par with the performance with manual topic annotations.

Full Paper

Bibliographic reference.  Sista, Sreenivasa / Srivastava, Amit / Kubala, Francis / Schwartz, Richard (2003): "Unsupervised topic discovery applied to segmentation of news transcriptions", In EUROSPEECH-2003, 2833-2836.