8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Speaker Recognition Using Kernel-PCA and Intersession Variability Modeling

Hagai Aronowitz

IBM T.J. Watson Research Center, USA

This paper presents a new method for text independent speaker recognition. We embed both training and test sessions into a session space. The session space is a direct sum of a common-speaker subspace and a speaker-unique subspace. The common-speaker subspace is Euclidean and is spanned by a set of reference sessions. Kernel-PCA is used to explicitly embed sessions into the common-speaker subspace. The common-speaker subspace typically captures attributes that are common to many speakers. The speaker-unique subspace is the orthogonal complement of the common-speaker subspace and typically captures attributes that are speaker unique. We model intersession variability in the common-speaker subspace, and combine it with the information that exists in the speaker-unique subspace. Our suggested framework leads to a 43.5% reduction in error rate compared to a Gaussian Mixture Model (GMM) baseline.

Full Paper

Bibliographic reference.  Aronowitz, Hagai (2007): "Speaker recognition using kernel-PCA and intersession variability modeling", In INTERSPEECH-2007, 298-301.