Odyssey 2010: The Speaker and Language Recognition Workshop
Brno, Czech Republic
In recent work , a simplified and highly effective approach to speaker recognition based on the cosine similarity between low-dimensional vectors, termed ivectors, defined in a total variability space was introduced. The total variability space representation is motivated by the popular Joint Factor Analysis (JFA) approach, but does not require the complication of estimating separate speaker and channel spaces and has been shown to be less dependent on score normalization procedures, such as z-norm and t-norm. In this paper, we introduce a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors. By avoiding the complication of z- and t-norm, the new approach further allows for application of a new unsupervised speaker adaptation technique to models defined in the ivector space. Experiments are conducted on the core condition of the NIST 2008 corpora, where, with adaptation, the new approach produces an equal error rate (EER) of 4.8% and min decision cost function (MinDCF) of 2.3% on all female speaker trials.
Full Paper (PDF)
Bibliographic reference. Dehak, Najim / Dehak, Reda / Glass, James / Reynolds, Douglas / Kenny, Patrick (2010): "Cosine Similarity Scoring without Score Normalization Techniques", In Odyssey-2010, paper 015.