8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Trainable Speaker Diarization

Hagai Aronowitz

IBM T.J. Watson Research Center, USA

This paper presents a novel framework for speaker diarization. We explicitly model intra-speaker inter-segment variability using a speaker-labeled training corpus and use this modeling to assess the speaker similarity between speech segments. Modeling is done by embedding segments into a segment-space using kernel-PCA, followed by explicit modeling of speaker variability in the segment-space. Our framework leads to a significant improvement in diarization accuracy. Finally, we present a similar method for bandwidth classification.

Full Paper

Bibliographic reference.  Aronowitz, Hagai (2007): "Trainable speaker diarization", In INTERSPEECH-2007, 1861-1864.