Universal background model based Gaussian mixture modeling (GMM-UBM) approach is a widely used method for speaker identification, where a GMM model is used to characterize a specific speakerís voice. The estimation of model parameters is generally performed based on the maximum likelihood (ML) or maximum a posteriori (MAP) criteria. However, interspeaker information to discriminate between different speakers is not taken into account in ML and MAP parameter estimation. To overcome this limitation, we design a discriminative performance metric to capture interspeaker variabilities leading to improve the classification performance of the GMM-UBM system. A learning algorithm is presented to tune the Gaussian mixture weights by optimizing the detection performance of GMM classifiers. We design an objective function to directly relate the model parameters to the performance metric. The comparative study of the proposed method is done with the GMM-UBM system on the 2001 NIST SRE corpus. Experimental results demonstrate that the proposed learning algorithm considerably improves the GMM-UBM system on speaker identification.
Bibliographic reference. Dehzangi, Omid / Ma, Bin / Chng, Eng Siong / Li, Haizhou (2010): "A discriminative performance metric for GMM-UBM speaker identification", In INTERSPEECH-2010, 2114-2117.