Gaussian mixture models (GMMs) are commonly used in text-independent speaker verification for modeling the spectral distribution of speech. Recent studies have shown the effectiveness of characterizing speaker information using just the mean vectors of the GMM in conjunction with support vector machine (SVM). This paper advocates the use of spectral correlation captured by covariance matrices, and investigates its effectiveness compared to and in complement with the mean vectors. We examine two approaches, i.e., homoscedastic and heteroscedastic modeling, in estimating the spectral correlation. We introduce two kernel metrics, i.e., Frobenius angle and log-Euclidean inner product, for measuring the similarity between speech utterances in terms of spectral correlation. Experiment conducted on the NIST 2006 speaker verification task shows that approximately 10% of improvement is achieved by using the spectral correlation in conjunction with the mean vectors.
Bibliographic reference. Wang, Eryu / Lee, Kong Aik / Ma, Bin / Li, Haizhou / Guo, Wu / Dai, Lirong (2010): "The estimation and kernel metric of spectral correlation for text-independent speaker verification", In INTERSPEECH-2010, 1065-1068.