Reliable speaker segmentation is critical in many applications in the speech processing domain. In this paper, we compare the performance of two speaker segmentation systems: the first one is inspired on a typical state-of-art speaker segmentation system, and the other is an improved version of the former system. We show that the proposed system has a better performance as it does not “over-segment” the data. This system includes an algorithm that randomly discards some of the point changes with a probability depending on its performance at any moment. Thus, the system merges adjacent segments when they are spoken by the same speaker with a high probability; anytime a change is discarded the discard probability will rise, as the system made a mistake; the opposite will occur when the two adjacent segments belong to different speakers, as there will not be a mistake in this case. We show the improvements of the new system through comparative experiments on data from the Spanish Parliament Sessions defined for the 2006 TC-STAR Automatic Speech Recognition evaluation campaign.
Bibliographic reference. Docio-Fernandez, Laura / Lopez-Otero, Paula / Garcia-Mateo, Carmen (2009): "An adaptive threshold computation for unsupervised speaker segmentation", In INTERSPEECH-2009, 840-843.