11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

The Estimation and Kernel Metric of Spectral Correlation for Text-Independent Speaker Verification

Eryu Wang (1), Kong Aik Lee (2), Bin Ma (2), Haizhou Li (2), Wu Guo (1), Lirong Dai (1)

(1) University of Science & Technology of China, China
(2) A*STAR, Singapore

Gaussian mixture models (GMMs) are commonly used in text-independent speaker verification for modeling the spectral distribution of speech. Recent studies have shown the effectiveness of characterizing speaker information using just the mean vectors of the GMM in conjunction with support vector machine (SVM). This paper advocates the use of spectral correlation captured by covariance matrices, and investigates its effectiveness compared to and in complement with the mean vectors. We examine two approaches, i.e., homoscedastic and heteroscedastic modeling, in estimating the spectral correlation. We introduce two kernel metrics, i.e., Frobenius angle and log-Euclidean inner product, for measuring the similarity between speech utterances in terms of spectral correlation. Experiment conducted on the NIST 2006 speaker verification task shows that approximately 10% of improvement is achieved by using the spectral correlation in conjunction with the mean vectors.

Full Paper

Bibliographic reference.  Wang, Eryu / Lee, Kong Aik / Ma, Bin / Li, Haizhou / Guo, Wu / Dai, Lirong (2010): "The estimation and kernel metric of spectral correlation for text-independent speaker verification", In INTERSPEECH-2010, 1065-1068.