This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations. Only one central microphone is used to record the meeting. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. The new system determines the model complexity automatically. It adapts the segment model from a universal background model, and uses the cross-likelihood ratio instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Together this reduces the speaker diarization error rate from 21.76% to 17.21% compared with the baseline system .
Bibliographic reference. Fu, Rong / Benest, Ian D. (2007): "An improved speaker diarization system", In INTERSPEECH-2007, 2605-2608.