Odyssey 2010: The Speaker and Language Recognition Workshop
Brno, Czech Republic
This paper presents a novel framework for unsupervised compensation of intra-session intra-speaker variability in the context of speaker diarization. Audio files are parameterized by sequences of GMM-supervectors representing overlapping short segments of speech. Session-dependent intra-session intra-speaker variability is estimated in an unsupervised manner, and is compensated using the nuisance attribute projection (NAP) method. The proposed compensation method is evaluated in the context of speaker diarization in two-speaker conversations. A simple and effective two-speaker diarization algorithm is introduced in which speaker diarization is performed in the compensated supervector-space. The proposed diarization algorithm was evaluated on summed telephone conversations and achieved a speaker error rate of 2.8% which is a 54% relative error reduction compared to a baseline BIC-based system. Finally, we evaluate the proposed system on a speaker recognition task in the summed-speech condition where improvement in speaker recognition accuracy is observed using the proposed diarization system.
Full Paper (PDF)
Bibliographic reference. Aronowitz, Hagai (2010): "Unsupervised Compensation of Intra-Session Intra-Speaker Variability for Speaker Diarization", In Odyssey-2010, paper 025.