We propose to use a new feature transformation (FT) function to construct supervectors of support vector machines for speaker recognition. Considering that estimation of bias vectors is more robust than that of transformation matrices, we define the FT function in a flexible form that transformation matrices and bias vectors are controlled by separate regression classes. Unlike the MLLR-based approach that needs a continuous speech recognition system, our FT function parameters are estimated based on a Gaussian mixture model (GMM). An iterative training procedure is used to achieve the maximum a posteriori estimation of the FT function parameters, which avoids the possible numerical problem caused by insufficient training data in the maximum likelihood estimation. Our approach is evaluated on the SRE2006 NIST evaluation and obtains better performance than a conventional SVM system based on GMM mean supervectors.
Bibliographic reference. Zhu, Donglai / Ma, Bin / Li, Haizhou (2008): "Using MAP estimation of feature transformation for speaker recognition", In INTERSPEECH-2008, 849-852.