In this paper, we propose a discriminative extension to agglomerative hierarchical clustering, a typical technique for speaker diarization, that fits seemlessy with most state-of-the art diarization algorithms. We propose to use maximum mutual information using bootstrapping i.e., initial predictions are used as input for retraining of models in an unsupervised fashion. This article describes this new approach, analyzes its behavior, and presents results on the official NIST Rich Transcription datasets. We show an absolute improvement of 4% DER with respect to the generative approach baseline. We also observe a strong correlation between the original error and the amount of improvement, that is, the better our predicted labels are, the more gain we obtain from discriminative training, which we interpret as a strong indication for the high potential of the extension.
Bibliographic reference. Vinyals, Oriol / Friedland, Gerald / Morgan, Nelson (2010): "Discriminative training for hierarchical clustering in speaker diarization", In INTERSPEECH-2010, 2326-2329.