7th International Conference on Spoken Language Processing
September 16-20, 2002
Structural Gaussian mixture models (SGMMs) are proposed for effi- cient text-independent speaker verification. A structural background model (SBM) is constructed first by hierarchically clustering all Gaussian mixture components in a universal background model (UBM). In this way the acoustic space is partitioned into multiple regions in different levels of resolution. For each target speaker, a SGMM can be generated through multi-level maximum a posteriori (MAP) adaptation from the SBM. During test, only a small subset of Gaussian mixture components is scored for each feature vector in order to reduce the computational cost significantly. Furthermore, the scores obtained in different layers of the tree-structured models are combined via a neural network for final decision. Different configurations are compared in the experiments conducted on the telephony speech data used in the NIST speaker verification evaluation. The experimental results show that computational reduction by a factor of 17 can be achieved with equal error rate (EER) reduced by 8% compared with the baseline. The SGMM-SBM also shows some advantages over the recently proposed hash GMM.
Bibliographic reference. Xiang, Bing / Berger, Toby (2002): "Structural Gaussian mixture models for efficient text-independent speaker verification", In ICSLP-2002, 1317-1320.