Odyssey 2010: The Speaker and Language Recognition Workshop

Brno, Czech Republic
28 June – 1 July 2010

Cosine Similarity Scoring without Score Normalization Techniques

Najim Dehak (1), Reda Dehak (2), James Glass (1), Douglas Reynolds (3), Patrick Kenny (4)

(1) MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, (2) Laboratoire de Recherche et de Developpement de l'EPITA (LRDE), Paris, (3) MIT Lincoln Laboratory, Lexington, (4) Centre de Recherche d'Informatique de Montréal (CRIM), Montréal

In recent work [1], a simplified and highly effective approach to speaker recognition based on the cosine similarity between low-dimensional vectors, termed ivectors, defined in a total variability space was introduced. The total variability space representation is motivated by the popular Joint Factor Analysis (JFA) approach, but does not require the complication of estimating separate speaker and channel spaces and has been shown to be less dependent on score normalization procedures, such as z-norm and t-norm. In this paper, we introduce a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors. By avoiding the complication of z- and t-norm, the new approach further allows for application of a new unsupervised speaker adaptation technique to models defined in the ivector space. Experiments are conducted on the core condition of the NIST 2008 corpora, where, with adaptation, the new approach produces an equal error rate (EER) of 4.8% and min decision cost function (MinDCF) of 2.3% on all female speaker trials.

Full Paper (PDF)

Bibliographic reference.  Dehak, Najim / Dehak, Reda / Glass, James / Reynolds, Douglas / Kenny, Patrick (2010): "Cosine Similarity Scoring without Score Normalization Techniques", In Odyssey-2010, paper 015.