In this paper, we propose two techniques to extend the recently introduced global Maximum Likelihood Linear Regression (MLLR) transformation (i.e. super-vector) based m-vector system for speaker verification into a multi-class MLLR m-vector system in the Universal Background Model (UBM) framework. In the first method, Gaussian mean vectors of the UBM are first grouped into several classes using conventional K-means and a proposed clustering algorithm based on Expectation Maximization (EM) and Maximum Likelihood (ML) concepts. Then, MLLR transformations are calculated for a given speech data with respect to each class, which are used in the form of super-vector for speaker representation by their m-vectors. In the second approach, several MLLR transformations are estimated with respect to pre-defined models called anchors. The proposed systems show better performance than the conventional system. Furthermore, the proposed UBMbased system does not require additional alignment of speech data with respect to the UBM for estimation of multiple MLLR transformations. We also further show that the proposed EM & ML clustering algorithm is robust to random initialization and provides equal or comparable system performance compared to K-means. The experimental results are shown on NIST 2008 SRE core condition over various tasks.
Bibliographic reference. Sarkar, A. K. / Barras, Claude (2013): "Anchor and UBM-based multi-class MLLR m-vector system for speaker verification", In INTERSPEECH-2013, 2450-2454.