8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Comparison of Two Kinds of Speaker Location Representation for SVM-Based Speaker Verification

Xianyu Zhao (1), Yuan Dong (1), Hao Yang (2), Jian Zhao (2), Liang Lu (2), Haila Wang (1)

(1) France Telecom R&D Beijing, China
(2) Beijing University of Posts & Telecommunications, China

In anchor modeling, each speaker utterance is represented as a fixed-length location vector in the space of reference speakers by scoring against a set of anchor models. SVM-based speaker verification systems using the anchor location representation have been studied in previously reported work with promising results. In this paper, linear combination weights in reference speaker weighting (RSW) adaptation are explored as an alternative kind of speaker location representation. And this kind of RSW location representation is compared with the anchor location representation in various speaker verification tasks on the 2006 NIST Speaker Recognition Evaluation corpus. Experimental results indicate that with long utterances for reliable maximum likelihood estimation in RSW, the RSW location representation leads to better speaker verification performance than the anchor location; while the latter is more effective for verification of short utterances in high-dimensional representation space.

Full Paper

Bibliographic reference.  Zhao, Xianyu / Dong, Yuan / Yang, Hao / Zhao, Jian / Lu, Liang / Wang, Haila (2007): "Comparison of two kinds of speaker location representation for SVM-based speaker verification", In INTERSPEECH-2007, 774-777.