INTERSPEECH 2004 - ICSLP
This paper describes the automatic speaker segmentation and clustering system for natural, multi-speaker meeting conversations based on multiple distant microphones. The system was evaluated in the NIST RT-04S Meeting Recognition Evaluation on the speaker diarization task and achieved speaker diarization performance of 28.17%. This system also aims to provide automatic speech segments and speaker grouping information for speech recognition, a necessary prerequisite for subsequent audio processing. A 44.5% word error rate was achieved for speech recognition.
Bibliographic reference. Jin, Qin / Schultz, Tanja (2004): "Speaker segmentation and clustering in meetings", In INTERSPEECH-2004, 597-600.