New in the 2004 edition of the NIST Speaker Recognition Evaluation (SRE) was the condition where unsupervised adaptation of speaker models is allowed. Despite the promising results on development test material, hardly any beneficial results were obtained in the Evaluation itself. An analysis is made why this was the case, and it appears that a minimum level of performance is essential to obtain results using adaptation that improve on the performance without adaptation. Further, the system should be well calibrated. For the conditions with 8 conversation sides we have been able to find improvement using unsupervised adaptation using the NIST 2004 evaluation, both for an UBM/GMM adaptation methodology, and a novel SVM adaptation methodology. The minimum DCF for a fused system drops from 0.259 for the unadapted condition to 0.231 for the adapted condition.
Cite as: Leeuwen, D.A.v. (2005) Speaker adaptation in the NIST speaker recognition evaluation 2004. Proc. Interspeech 2005, 1981-1984, doi: 10.21437/Interspeech.2005-623
@inproceedings{leeuwen05_interspeech, author={David A. van Leeuwen}, title={{Speaker adaptation in the NIST speaker recognition evaluation 2004}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1981--1984}, doi={10.21437/Interspeech.2005-623} }