This paper presents a novel framework for speaker diarization. We explicitly model intra-speaker inter-segment variability using a speaker-labeled training corpus and use this modeling to assess the speaker similarity between speech segments. Modeling is done by embedding segments into a segment-space using kernel-PCA, followed by explicit modeling of speaker variability in the segment-space. Our framework leads to a significant improvement in diarization accuracy. Finally, we present a similar method for bandwidth classification.
Bibliographic reference. Aronowitz, Hagai (2007): "Trainable speaker diarization", In INTERSPEECH-2007, 1861-1864.