9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Using MAP Estimation of Feature Transformation for Speaker Recognition

Donglai Zhu, Bin Ma, Haizhou Li

Institute for Infocomm Research, Singapore

We propose to use a new feature transformation (FT) function to construct supervectors of support vector machines for speaker recognition. Considering that estimation of bias vectors is more robust than that of transformation matrices, we define the FT function in a flexible form that transformation matrices and bias vectors are controlled by separate regression classes. Unlike the MLLR-based approach that needs a continuous speech recognition system, our FT function parameters are estimated based on a Gaussian mixture model (GMM). An iterative training procedure is used to achieve the maximum a posteriori estimation of the FT function parameters, which avoids the possible numerical problem caused by insufficient training data in the maximum likelihood estimation. Our approach is evaluated on the SRE2006 NIST evaluation and obtains better performance than a conventional SVM system based on GMM mean supervectors.

Full Paper

Bibliographic reference.  Zhu, Donglai / Ma, Bin / Li, Haizhou (2008): "Using MAP estimation of feature transformation for speaker recognition", In INTERSPEECH-2008, 849-852.