11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

MAP Estimation of Subspace Transform for Speaker Recognition

Donglai Zhu, Bin Ma, Kong Aik Lee, Cheung-Chi Leung, Haizhou Li

A*STAR, Singapore

We propose using the maximum-a-posteriori (MAP) estimation of subspace transform for speaker recognition. The transform function is defined on the means of the Gaussian mixture model (GMM), where transform matrices and bias vectors are associated with separate regression classes so that both can be estimated with sufficient statistics given limited training data. The transform matrices are further defined as a linear combination of a set of base transforms so that the linear weights are parameters to be estimated. We characterize the speakers with transform parameters and model them using support vector machine (SVM). Experiments on the 2008 NIST SRE task illustrate the effectiveness of the method.

Full Paper

Bibliographic reference.  Zhu, Donglai / Ma, Bin / Lee, Kong Aik / Leung, Cheung-Chi / Li, Haizhou (2010): "MAP estimation of subspace transform for speaker recognition", In INTERSPEECH-2010, 1465-1468.