ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Using MAP estimation of feature transformation for speaker recognition

Donglai Zhu, Bin Ma, Haizhou Li

We propose to use a new feature transformation (FT) function to construct supervectors of support vector machines for speaker recognition. Considering that estimation of bias vectors is more robust than that of transformation matrices, we define the FT function in a flexible form that transformation matrices and bias vectors are controlled by separate regression classes. Unlike the MLLR-based approach that needs a continuous speech recognition system, our FT function parameters are estimated based on a Gaussian mixture model (GMM). An iterative training procedure is used to achieve the maximum a posteriori estimation of the FT function parameters, which avoids the possible numerical problem caused by insufficient training data in the maximum likelihood estimation. Our approach is evaluated on the SRE2006 NIST evaluation and obtains better performance than a conventional SVM system based on GMM mean supervectors.


doi: 10.21437/Interspeech.2008-273

Cite as: Zhu, D., Ma, B., Li, H. (2008) Using MAP estimation of feature transformation for speaker recognition. Proc. Interspeech 2008, 849-852, doi: 10.21437/Interspeech.2008-273

@inproceedings{zhu08_interspeech,
  author={Donglai Zhu and Bin Ma and Haizhou Li},
  title={{Using MAP estimation of feature transformation for speaker recognition}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={849--852},
  doi={10.21437/Interspeech.2008-273}
}