We propose using the maximum-a-posteriori (MAP) estimation of subspace transform for speaker recognition. The transform function is defined on the means of the Gaussian mixture model (GMM), where transform matrices and bias vectors are associated with separate regression classes so that both can be estimated with sufficient statistics given limited training data. The transform matrices are further defined as a linear combination of a set of base transforms so that the linear weights are parameters to be estimated. We characterize the speakers with transform parameters and model them using support vector machine (SVM). Experiments on the 2008 NIST SRE task illustrate the effectiveness of the method.
Bibliographic reference. Zhu, Donglai / Ma, Bin / Lee, Kong Aik / Leung, Cheung-Chi / Li, Haizhou (2010): "MAP estimation of subspace transform for speaker recognition", In INTERSPEECH-2010, 1465-1468.