ODYSSEY 2004 - The Speaker and Language Recognition Workshop

May 31 - June 3, 2004
Toledo, Spain

Fusing Discriminative and Generative Methods for Speaker Recognition: Experiments on Switchboard and NFI/TNO Field Data

William M. Campbell, Douglas A. Reynolds, Joseph P. Campbell

MIT Lincoln Laboratory Lexington, MA USA

Discriminatively trained support vector machines have recently been introduced as a novel approach to speaker recognition. Support vector machines (SVMs) have a distinctly different modeling strategy in the speaker recognition problem. The standard Gaussian mixture model (GMM) approach focuses on modeling the probability density of the speaker and the background (a generative approach). In contrast, the SVM models the boundary between the classes. Another interesting aspect of the SVM is that it does not directly produce probabilistic scores. This poses a challenge for combining results with a GMM. We therefore propose strategies for fusing the two approaches. We show that the SVM and GMM are complementary technologies. Recent evaluations by NIST (telephone data) and NFI/TNO (forensic data) give a unique opportunity to test the robustness and viability of fusing GMM and SVM methods. We show that fusion produces a system which can have relative error rates 23% lower than individual systems.

Full Paper

Bibliographic reference.  Campbell, William M. / Reynolds, Douglas A. / Campbell, Joseph P. (2004): "Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data", In ODYS-2004, 41-44.