ODYSSEY 2004 - The Speaker and Language Recognition Workshop
May 31 - June 3, 2004
This paper compares three approaches to building phoneme-specific Gaussian mixture model (GMM) speaker recognition systems on the NIST 2003 Extended Data Evaluation to a baseline GMM system covering all of the phonemes. The individual performance of any given phoneme-specific GMM system falls below the performance of the baseline GMM, but fusing the top 40 performing scores of the individual phoneme systems at the 8 conversation train condition resulted in an equal error rate of 1.7%, which is a 2.6% absolute reduction in equal error rate from the baseline system. Further investigation showed complementary information across the three model building approaches as error rates dropped on a per phoneme basis when these systems were fused.
Bibliographic reference. Hansen, Eric G. / Slyh, Raymond E. / Anderson, Timothy R. (2004): "Speaker recognition using phoneme-specific GMMs", In ODYS-2004, 179-184.