14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Exploring Methods of Improving Speaker Accuracy for Speaker Diarization

Mary Tai Knox, Nikki Mirghafori, Gerald Friedland


The focus of this work is to improve the speaker diarization error rate, and more specifically the speaker error rate. We investigate two methods of improving the speaker error rate: modifying the minimum duration constraint and incorporating novel purification techniques. First, in the final step of the speaker diarization algorithm we replace the minimum duration constraint with a simple smoothing algorithm, which averages the log-likelihoods for each of the hypothesized speakers. This method improves the speaker error rate by 12% relative for the MDM condition. Second, we utilize the difference between the largest and second largest log-likelihoods to identify frames which are believed to be correct (or "pure"). The difference value is shown be more effective at separating correct frames from incorrect frames than the previously used maximum log-likelihood value. Using only the "pure" frames, the cluster models are retrained and segmentation is performed using the above mentioned smoothing technique. The proposed purification and smoothing reduces the speaker error rate over the baseline; however, it is worse than performing the smoothing step alone.

Full Paper

Bibliographic reference.  Knox, Mary Tai / Mirghafori, Nikki / Friedland, Gerald (2013): "Exploring methods of improving speaker accuracy for speaker diarization", In INTERSPEECH-2013, 2783-2787.