Accessing Information in Spoken Audio
April 19-20, 1999
In this paper, we present a first approach to build an automatic system for broadcast news speaker-based segmentation. Based on a "Chop-and-Recluster" method, this system is developed in the framework of the THISL project. A metric-based segmentation is used for the "Chop" procedure and different distances have been investigated. The "Recluster" procedure relies on a "bottom-up" clustering of segments obtained beforehand and represented by non-parametric models. Various hierarchical clustering schemes have been tested. Some experiments on BBC broadcast news recordings show that the system can detect real speaker changes with high accuracy (mean error = 0.7s) and fair false alarm rate (mean false alarm rate = 5.5% ). The "Recluster" procedure can produce homogeneous clusters but it is not already robust enough to tackle too complex classification tasks.
Full Paper (PDF) Full Paper (Zipped Postscript)
Bibliographic reference. Couvreur, Laurent / Boite, Jean-Marc (1999): "Speaker tracking in broadcast audio material in the framework of the THISL project", In Access-Audio-1999, 84-89.