Speaker recognition using phoneme-specific GMMs

Eric G. Hansen, Raymond E. Slyh, Timothy R. Anderson

This paper compares three approaches to building phoneme-specific Gaussian mixture model (GMM) speaker recognition systems on the NIST 2003 Extended Data Evaluation to a baseline GMM system covering all of the phonemes. The individual performance of any given phoneme-specific GMM system falls below the performance of the baseline GMM, but fusing the top 40 performing scores of the individual phoneme systems at the 8 conversation train condition resulted in an equal error rate of 1.7%, which is a 2.6% absolute reduction in equal error rate from the baseline system. Further investigation showed complementary information across the three model building approaches as error rates dropped on a per phoneme basis when these systems were fused.

Cite as: Hansen, E.G., Slyh, R.E., Anderson, T.R. (2004) Speaker recognition using phoneme-specific GMMs. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 179-184

