This paper compares three approaches to building phoneme-specific Gaussian mixture model (GMM) speaker recognition systems on the NIST 2003 Extended Data Evaluation to a baseline GMM system covering all of the phonemes. The individual performance of any given phoneme-specific GMM system falls below the performance of the baseline GMM, but fusing the top 40 performing scores of the individual phoneme systems at the 8 conversation train condition resulted in an equal error rate of 1.7%, which is a 2.6% absolute reduction in equal error rate from the baseline system. Further investigation showed complementary information across the three model building approaches as error rates dropped on a per phoneme basis when these systems were fused.
Cite as: Hansen, E.G., Slyh, R.E., Anderson, T.R. (2004) Speaker recognition using phoneme-specific GMMs. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 179-184
@inproceedings{hansen04_odyssey, author={Eric G. Hansen and Raymond E. Slyh and Timothy R. Anderson}, title={{Speaker recognition using phoneme-specific GMMs}}, year=2004, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)}, pages={179--184} }