A fine-grained segmentation of Radio or TV broadcasts is an essential step for most multimedia processings. Applying segmentation algorithms to the speech transcripts seems straightforward. Yet, most of these algorithms are not suited when dealing with short segments or noisy data. In this paper, we propose a new segmentation technique inspired from the image segmentation field and relying on a new way to compute similarities between candidate segments. This new topic segmentation technique is evaluated on two corpora of French TV broadcasts on which it largely outperforms other existing approaches from the state-of-the-art.
Bibliographic reference. Claveau, Vincent / Lefèvre, Sébastien (2011): "Topic segmentation of TV-streams by mathematical morphology and vectorization", In INTERSPEECH-2011, 1105-1108.