Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Robust Speaker Diarization for Meetings: ICSI RT06s Evaluation System

Xavier Anguera, Chuck Wooters, Jose M. Pardo

International Computer Science Institute, USA

In this paper we present the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) [1] conducted on the meetings environment. This is a set of yearly evaluations which in the last two years have included speaker diarization of two kinds of distinct meetings: conference room and lecture room. The system presented focuses on being robust to changes in the meeting conditions by not using any training data. In this paper we introduce four of the main improvements to the system from last years’ submission: The first is a new training-free speech/non-speech detection algorithm. The second is the introduction of a new algorithm for system initialization. The third is the use of a frame purification algorithm to increase clusters differentiability. The last improvement is the use of inter-channel delays as features, greatly improving performance. We show the diarization error rate (DER) score of this system on all available meeting datasets to date for the multiple distant microphone (MDM) and single distant microphone (SDM) conditions.

Full Paper

Bibliographic reference.  Anguera, Xavier / Wooters, Chuck / Pardo, Jose M. (2006): "Robust speaker diarization for meetings: ICSI RT06s evaluation system", In INTERSPEECH-2006, paper 1716-Wed1FoP.6.