ODYSSEY 2004 - The Speaker and Language Recognition Workshop
May 31 - June 3, 2004
This paper presents experiments of unsupervised adaptation for a speaker detection system. The system used is a standard speaker verification system based on cepstral features and Gaussian mixture models. Experiments were performed on cellular speech data taken from the NIST 2002 speaker detection evaluation. There was a total of about 30.000 trials involving 330 target speakers and more than 90% of impostor trials. Unsupervised adaptation significantly increases the system accuracy, with a reduction of the minimal detection cost function (DCF) from 0.33 for the baseline system to 0.25 with unsupervised online adaptation. Two incremental adaptation modes were tested, either by using a fixed decision threshold for adaptation, or by using the a posteriori probability of the true target for weighting the adaptation. Both methods provide similar results in the best configurations, but the latter is less sensitive to the actual threshold value.
Bibliographic reference. Barras, Claude / Meignier, Sylvain / Gauvain, Jean-Luc (2004): "Unsupervised online adaptation for speaker verification over the telephone", In ODYS-2004, 157-160.