The increasing quantity of video material requires methods to help users navigate such data, among which topic segmentation techniques. The goal of this article is to improve ASR-based topic segmentation methods to deal with peculiarities of professionnal-video transcripts (transcription errors and lack of repetitions) while remaining generic enough. To this end, we introduce confidence measures and semantic relations in a segmentation method based on lexical cohesion. We show significant improvements of the F1-measure, +1.7 and +1.9 when integrating confidence measures and semantic relations respectively. Such improvement demonstrates that simple clues can conteract errors in automatic transcripts and lack of repetitions.
Bibliographic reference. Guinaudeau, Camille / Gravier, Guillaume / Sébillot, Pascale (2010): "Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations", In INTERSPEECH-2010, 1365-1368.