Interspeech'2005 - Eurospeech
The Gaussian mixture model-universal background model (GMMUBM) has been dominant in text-independent speaker recognition tasks. However the conventional GMM-UBM method assumes that each Gaussian mixture is independent and ignores the fact that within Gaussian mixtures, there do exist some useful high-level speaker-dependent characteristics, such as word usage or speaking habits. Based on the GMM-UBM method, a method is proposed to use Gaussian mixture correlation to model the high-level information for speaker recognition tasks. In this method, we first cluster the Gaussian mixtures of the UBM into a small number of classes in terms of the mean vectors; in the following step, a universal class transition probability matrix (UCTPM) is learned which is helpful in modeling the high-level speaker's characteristics embedded in Gaussian mixture correlation. During the training phase, a speaker-dependent class transition probability matrix is adapted from the UCTPM. Experiments over two different databases show that an average 20.38% error rate reduction (ERR) can be achieved compared with the conventional GMM-UBM method.
Bibliographic reference. Deng, Jing / Zheng, Thomas Fang / Song, Zhanjiang / Liu, Jian (2005): "Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition", In INTERSPEECH-2005, 2033-2036.