Support vector machine (SVM) equipped with sequence kernel has been proven to be a powerful technique for speaker verification. A number of sequence kernels have been recently proposed, each being motivated from different perspectives with diverse mathematical derivations. Analytical comparison of kernels becomes difficult. To facilitate such comparisons, we propose a generic structure showing how different levels of cues conveyed by speech utterances, ranging from low-level acoustic features to high-level speaker cues, are being characterized within a sequence kernel. We then identify the similarities and differences between the popular generalized linear discriminant sequence (GLDS) and GMM supervector kernels, as well as our own probabilistic sequence kernel (PSK). Furthermore, we enhance the PSK in terms of accuracy and computational complexity. The enhanced PSK gives competitive accuracy with the other two kernels. Fusing all the three kernels yields an EER of 4.83% on the 2006 NIST SRE core test.
Bibliographic reference. Lee, Kong-Aik / You, Changhuai / Li, Haizhou / Kinnunen, Tomi / Zhu, Donglai (2008): "Characterizing speech utterances for speaker verification with sequence kernel SVM", In INTERSPEECH-2008, 1397-1400.