ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A new quality measure for topic segmentation of text and speech

Mehryar Mohri, Pedro Moreno, Eugene Weinstein

The recent proliferation of large multimedia collections has gathered immense attention from the speech research community, because speech recognition enables the transcription and indexing of such collections. Topicality information can be used to improve transcription quality and enable content navigation. In this paper, we give a novel quality measure for topic segmentation algorithms that improves over previously used measures. Our measure takes into account not only the presence or absence of topic boundaries but also the content of the text or speech segments labeled as topic-coherent. Additionally, we demonstrate that topic segmentation quality of spoken language can be improved using speech recognition lattices. Using lattices, improvements over the baseline one-best topic model are observed when measured with the previously existing topic segmentation quality measure, as well as the new measure proposed in this paper (9.4% and 7.0% relative error reduction, respectively).

doi: 10.21437/Interspeech.2009-701

Cite as: Mohri, M., Moreno, P., Weinstein, E. (2009) A new quality measure for topic segmentation of text and speech. Proc. Interspeech 2009, 2743-2746, doi: 10.21437/Interspeech.2009-701

  author={Mehryar Mohri and Pedro Moreno and Eugene Weinstein},
  title={{A new quality measure for topic segmentation of text and speech}},
  booktitle={Proc. Interspeech 2009},