Symmetric distortion measure for speaker recognition

Evgeny Karpov, Tomi Kinnunen, Pasi Fränti

We consider matching functions in vector quantization (VQ) based speaker recognition systems. In VQ-based systems, a speaker model consists of a small collection of representative vectors, and matching is performed by computing a dissimilarity value between the unknown speaker’s feature vectors and the speaker models. Typically, the average/total quantization error is used as the dissimilarity measure. However, this measure lack the symmetricity requirement of a proper distance measure. This is counterintuitive because match score between speakers X and Y is different from the match score between Y and X. Furthermore, the distortion measure can yield a zero value (perfect match) for non-identical vector sets, which is undesirable. In this study, we study ways of making the quantization distortion functions proper distance measures. The study includes discussion of the theoretical properties of different measures, as well as an evaluation on a subset of the NIST99 speaker recognition evaluation corpus.

Cite as: Karpov, E., Kinnunen, T., Fränti, P. (2004) Symmetric distortion measure for speaker recognition. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 366-370

