Speaker verification is a binary classification task to determine whether a claimed speaker uttered a phrase. Current approaches to speaker verification tasks typically involve adapting a general speaker Universal Background Model (UBM), normally a Gaussian Mixture Model (GMM), to model a particular speaker. Verification is then performed by comparing the likelihoods from the speaker model to the UBM. Maximum A-Posteriori (MAP) is commonly used to adapt the UBM to a particular speaker. However speaker verification is a classification task. Thus, robust discriminative-based adaptation schemes should yield gains over the standard MAP approach. This paper describes and evaluates two discriminative approaches to speaker verification. The first is a discriminative version of MAP based on Maximum Mutual Information (MMI-MAP). The second is to use an augmented-GMM (A-GMM) as the speaker-specific model. The additional, augmented, parameters are discriminatively, and robustly, trained using a maximum margin estimation approach. The performance of these models is evaluated on the NIST 2002 SRE dataset. Though no gains were obtained using MMI-MAP, the A-GMM system gave an Equal Error Rate (EER) of 7.31%, a 30% relative reduction in EER compared to the best performing GMM system.
Cite as: Longworth, C., Gales, M.J.F. (2006) Discriminative adaptation for speaker verification. Proc. Interspeech 2006, paper 1553-Wed1A1O.4, doi: 10.21437/Interspeech.2006-182
@inproceedings{longworth06_interspeech, author={C. Longworth and M. J. F. Gales}, title={{Discriminative adaptation for speaker verification}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1553-Wed1A1O.4}, doi={10.21437/Interspeech.2006-182} }