9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Characterizing Speech Utterances for Speaker Verification with Sequence Kernel SVM

Kong-Aik Lee (1), Changhuai You (1), Haizhou Li (1), Tomi Kinnunen (2), Donglai Zhu (1)

(1) Institute for Infocomm Research, Singapore; (2) University of Joensuu, Finland

Support vector machine (SVM) equipped with sequence kernel has been proven to be a powerful technique for speaker verification. A number of sequence kernels have been recently proposed, each being motivated from different perspectives with diverse mathematical derivations. Analytical comparison of kernels becomes difficult. To facilitate such comparisons, we propose a generic structure showing how different levels of cues conveyed by speech utterances, ranging from low-level acoustic features to high-level speaker cues, are being characterized within a sequence kernel. We then identify the similarities and differences between the popular generalized linear discriminant sequence (GLDS) and GMM supervector kernels, as well as our own probabilistic sequence kernel (PSK). Furthermore, we enhance the PSK in terms of accuracy and computational complexity. The enhanced PSK gives competitive accuracy with the other two kernels. Fusing all the three kernels yields an EER of 4.83% on the 2006 NIST SRE core test.

Full Paper

Bibliographic reference.  Lee, Kong-Aik / You, Changhuai / Li, Haizhou / Kinnunen, Tomi / Zhu, Donglai (2008): "Characterizing speech utterances for speaker verification with sequence kernel SVM", In INTERSPEECH-2008, 1397-1400.