11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Discriminative Training for Hierarchical Clustering in Speaker Diarization

Oriol Vinyals, Gerald Friedland, Nelson Morgan


In this paper, we propose a discriminative extension to agglomerative hierarchical clustering, a typical technique for speaker diarization, that fits seemlessy with most state-of-the art diarization algorithms. We propose to use maximum mutual information using bootstrapping i.e., initial predictions are used as input for retraining of models in an unsupervised fashion. This article describes this new approach, analyzes its behavior, and presents results on the official NIST Rich Transcription datasets. We show an absolute improvement of 4% DER with respect to the generative approach baseline. We also observe a strong correlation between the original error and the amount of improvement, that is, the better our predicted labels are, the more gain we obtain from discriminative training, which we interpret as a strong indication for the high potential of the extension.

Full Paper

Bibliographic reference.  Vinyals, Oriol / Friedland, Gerald / Morgan, Nelson (2010): "Discriminative training for hierarchical clustering in speaker diarization", In INTERSPEECH-2010, 2326-2329.