Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Modeling High-Level Information by Using Gaussian Mixture Correlation for GMM-UBM Based Speaker Recognition

Jing Deng (1), Thomas Fang Zheng (1), Zhanjiang Song (2), Jian Liu (1)

(1) Tsinghua University, Beijing, China; (2) Beijing d-Ear Technologies Co. Ltd., China

The Gaussian mixture model-universal background model (GMMUBM) has been dominant in text-independent speaker recognition tasks. However the conventional GMM-UBM method assumes that each Gaussian mixture is independent and ignores the fact that within Gaussian mixtures, there do exist some useful high-level speaker-dependent characteristics, such as word usage or speaking habits. Based on the GMM-UBM method, a method is proposed to use Gaussian mixture correlation to model the high-level information for speaker recognition tasks. In this method, we first cluster the Gaussian mixtures of the UBM into a small number of classes in terms of the mean vectors; in the following step, a universal class transition probability matrix (UCTPM) is learned which is helpful in modeling the high-level speaker's characteristics embedded in Gaussian mixture correlation. During the training phase, a speaker-dependent class transition probability matrix is adapted from the UCTPM. Experiments over two different databases show that an average 20.38% error rate reduction (ERR) can be achieved compared with the conventional GMM-UBM method.

Full Paper

Bibliographic reference.  Deng, Jing / Zheng, Thomas Fang / Song, Zhanjiang / Liu, Jian (2005): "Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition", In INTERSPEECH-2005, 2033-2036.