ODYSSEY 2004 - The Speaker and Language Recognition Workshop

May 31 - June 3, 2004
Toledo, Spain

Optimization of GMM Training For Speaker Verification

Yongxin Zhang, Michael S. Scordilis

Department of Electrical and Computer Engineering, University of Miami, FL, USA

EM training of GMM often suffers from the existence of local maxima and singularities in the likelihood space. In this paper, we present a new Modified Split-and-Merge EM algorithm (MSMEM) for speaker verification tasks, which performs split-and-merge operations to escape from local maxima and reduce the chances of generating singularities. With two modified criteria to select split-and-merge candidates for speaker verification task, the overall likelihoods of both training and testing data are improved. Furthermore, modified adaptive variance flooring is introduced in the new EM procedure. Experiments on synthetic data show the advantages of MSMEM. Global threshold EER results on a speaker verification task using the TIMIT database confirm the improvement of the system performance.

Full Paper

Bibliographic reference.  Zhang, Yongxin / Scordilis, Michael S. (2004): "Optimization of GMM training for speaker verification", In ODYS-2004, 231-236.