ODYSSEY 2004 - The Speaker and Language Recognition Workshop

May 31 - June 3, 2004
Toledo, Spain

Speaker Recognition using Phoneme-Specific GMMs

Eric G. Hansen, Raymond E. Slyh, Timothy R. Anderson

Air Force Research Laboratory, Human Effectiveness Directorate, Wright-Patterson AFB Ohio, USA

This paper compares three approaches to building phoneme-specific Gaussian mixture model (GMM) speaker recognition systems on the NIST 2003 Extended Data Evaluation to a baseline GMM system covering all of the phonemes. The individual performance of any given phoneme-specific GMM system falls below the performance of the baseline GMM, but fusing the top 40 performing scores of the individual phoneme systems at the 8 conversation train condition resulted in an equal error rate of 1.7%, which is a 2.6% absolute reduction in equal error rate from the baseline system. Further investigation showed complementary information across the three model building approaches as error rates dropped on a per phoneme basis when these systems were fused.

Full Paper

Bibliographic reference.  Hansen, Eric G. / Slyh, Raymond E. / Anderson, Timothy R. (2004): "Speaker recognition using phoneme-specific GMMs", In ODYS-2004, 179-184.