Odyssey 2010: The Speaker and Language Recognition Workshop

Brno, Czech Republic
28 June 1 July 2010

Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification

Stephen Shum, Najim Dehak (1), Reda Dehak (2), James Glass (1)

(1) Massachusetts Institute of Technology, (2) Laboratoire de Recherche et de Developpement de l'EPITA

This paper proposes a new approach to unsupervised speaker adaptation inspired by the recent success of the factor analysis-based Total Variability Approach to text-independent speaker verification [1]. This approach effectively represents speaker variability in terms of low-dimensional total factor vectors and, when paired alongside the simplicity of cosine similarity scoring, allows for easy manipulation and efficient computation [2]. The development of our adaptation algorithm is motivated by the desire to have a robust method of setting an adaptation threshold, to minimize the amount of required computation for each adaptation update, and to simplify the associated score normalization procedures where possible. To address the final issue, we propose the Symmetric Normalization (S-norm) method, which takes advantage of the symmetry in cosine similarity scoring and achieves competitive performance to that of the ZT-norm while requiring fewer parameter calculations. In subsequent experiments, we also assess an attempt to replace the use of score normalization procedures altogether with a Normalized Cosine Similarity scoring function [3]. We evaluated the performance of our unsupervised speaker adaptation algorithm under various score normalization procedures on the 10sec-10sec and core conditions of the 2008 NIST SRE dataset. Using results without adaptation as our baseline, it was found that the proposed methods are consistent in successfully improving speaker verification performance to achieve state-of-the-art results.

Full Paper (PDF)

Bibliographic reference.  Shum, Stephen / Dehak, Najim / Dehak, Reda / Glass, James (2010): "Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification", In Odyssey-2010, paper 016.