ISCA Archive Odyssey 2004
ISCA Archive Odyssey 2004

Optimization of GMM training for speaker verification

Yongxin Zhang, Michael S. Scordilis

EM training of GMM often suffers from the existence of local maxima and singularities in the likelihood space. In this paper, we present a new Modified Split-and-Merge EM algorithm (MSMEM) for speaker verification tasks, which performs split-and-merge operations to escape from local maxima and reduce the chances of generating singularities. With two modified criteria to select split-and-merge candidates for speaker verification task, the overall likelihoods of both training and testing data are improved. Furthermore, modified adaptive variance flooring is introduced in the new EM procedure. Experiments on synthetic data show the advantages of MSMEM. Global threshold EER results on a speaker verification task using the TIMIT database confirm the improvement of the system performance.


Cite as: Zhang, Y., Scordilis, M.S. (2004) Optimization of GMM training for speaker verification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 231-236

@inproceedings{zhang04_odyssey,
  author={Yongxin Zhang and Michael S. Scordilis},
  title={{Optimization of GMM training for speaker verification}},
  year=2004,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)},
  pages={231--236}
}