10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A New Quality Measure for Topic Segmentation of Text and Speech

Mehryar Mohri, Pedro Moreno, Eugene Weinstein

Google Inc., USA

The recent proliferation of large multimedia collections has gathered immense attention from the speech research community, because speech recognition enables the transcription and indexing of such collections. Topicality information can be used to improve transcription quality and enable content navigation. In this paper, we give a novel quality measure for topic segmentation algorithms that improves over previously used measures. Our measure takes into account not only the presence or absence of topic boundaries but also the content of the text or speech segments labeled as topic-coherent. Additionally, we demonstrate that topic segmentation quality of spoken language can be improved using speech recognition lattices. Using lattices, improvements over the baseline one-best topic model are observed when measured with the previously existing topic segmentation quality measure, as well as the new measure proposed in this paper (9.4% and 7.0% relative error reduction, respectively).

Full Paper

Bibliographic reference.  Mohri, Mehryar / Moreno, Pedro / Weinstein, Eugene (2009): "A new quality measure for topic segmentation of text and speech", In INTERSPEECH-2009, 2743-2746.