September 22-25, 1997
The Gaussian mixture speaker model (GMM) is usually trained with the expectation-maximization (EM) algorithm to maximize the likelihood (ML) of observation data from an individual class. The GMM trained based the ML criterion has weak discriminative power when used as a classifier. In this paper, a discriminative training procedure is proposed to fine-tune the parameters in the GMMs. The goal of the training is to reduce the number of misclassified vector groups. Since a vector group can be thought as derived from a short sentence, this training procedure optimize the speaker identification performance more directly. Even though the algorithm itself is based on an heuristic idea, it works fine for many practical problems. Besides, the training speed is very fast. In an evaluation experiment with the YOHO database, when each speaker is modeled with 8 mixtures, the identification rate increases from 83.8% to 92.4% after applying this discriminative training algorithm.
Bibliographic reference. He, Jialong / Liu, Li / Palm, GŁnther (1997): "A discriminative training algorithm for Gaussian mixture speaker models", In EUROSPEECH-1997, 959-962.