12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Speaker Verification Using Sparse Representations on Total Variability i-vectors

Ming Li (1), Xiang Zhang (2), Yonghong Yan (2), Shrikanth Narayanan (1)

(1) University of Southern California, USA
(2) Chinese Academy of Sciences, China

In this paper, the sparse representation computed by l1- minimization with quadratic constraints is employed to model the i-vectors in the low dimensional total variability space after performing the Within-Class Covariance Normalization and Linear Discriminate Analysis channel compensation. First, we propose the background normalized l2 residual as a scoring criterion. Second, we demonstrate that the Tnorm can be efficiently achieved by using the Tnorm data as the non-target samples in the over-complete dictionary. Finally, by fusing with the conventional i-vector based support vector machine (SVM) and cosine distance scoring system, we demonstrate overall system performance improvement. Exper- imental results show that the proposed fusion system achieved 4.05% (male) and 5.25% (female) equal error rate (EER) after Tnorm on the single-single multi-language handheld telephone task of NIST SRE 2008 and outperformed the SVM baseline by yielding 7.1% and 4.9% relative EER reduction for the male and female tasks, respectively.

Full Paper

Bibliographic reference.  Li, Ming / Zhang, Xiang / Yan, Yonghong / Narayanan, Shrikanth (2011): "Speaker verification using sparse representations on total variability i-vectors", In INTERSPEECH-2011, 2729-2732.