12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models

D. Wang, Robbie Vogt, Sridha Sridharan, David Dean

Queensland University of Technology, Australia

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT- 02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Full Paper

Bibliographic reference.  Wang, D. / Vogt, Robbie / Sridharan, Sridha / Dean, David (2011): "Cross likelihood ratio based speaker clustering using eigenvoice models", In INTERSPEECH-2011, 957-960.