We address the speaker partitioning problem on datasets composed of two-speaker conversations. In such a situation, it is desirable to obtain a good overall diarization performance but even in that case, the performance of the partitioning problem can be severely degraded if some of the recordings are incorrectly segmented. We show that the performance of a bottom-up speaker clustering approach for the partitioning of two-speaker conversation datasets is sensitive to errors in the diarization, up to a point that the Diarization Error Rate for every recording should be as low as 1% to avoid degradation in performance due to the diarization process. Finally we propose a set of confidence measures along with a logistic regression approach to detect those conversations whose segmentation hypothesis is reliable enough to perform speaker clustering, showing that it enables an improvement in clustering performance at the expense of missing a small portion of the speakers in the dataset.
Bibliographic reference. Vaquero, Carlos / Ortega, Alfonso / Lleida, Eduardo (2011): "Partitioning of two-speaker conversation datasets", In INTERSPEECH-2011, 385-388.