This work is focused on speaker clustering methods that are used in speaker diarization systems. The purpose of speaker clustering is to associate together segments that belong to the same speaker and is usually applied in the last stage of the speaker-diarization process. We concentrate on developing proper representations of speaker segments for clustering. We realize two different speaker clustering systems. The first is a standard approach using a bottomup agglomerative clustering principle with the Bayesian Information Criterion as a merging criterion. In the second system we developed a fusion-based speaker-clustering, where speaker segments are modeled by acoustic and prosodic representations. In this way we additionally model the speaker prosodic and phonetic characteristics and combine them with the basic acoustic information of speakers. This leads to improved clustering of the segments in the case of similar speaker acoustic properties and poor acoustic conditions.
Bibliographic reference. Žibert, Janez / Mihelič, France (2011): "Prosodic and phonetic features for speaker clustering in speaker diarization systems", In INTERSPEECH-2011, 1033-1036.