ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification

Martigny, Switzerland
April 7-9, 1994

Text-Independent Speaker Recognition Using VQ, Mixture Gaussian VQ and Ergodic HMMs

Xiaoyuan Zhu, Yuqing Gao, Shuping Ran, Fangxin Chen, Iain Macleod, Bruce Millar, Michael Wagner

TRUST (Technology for Robust User-conscious Secure Transactions) Project, Australian National University, Canberra, Austtralia

Alternative techniques are evaluated for text independent speaker recognition in a speech activated menu navigation task, typical of windows-based interactive computing. Even though the vocabulary employed may be relatively small, ease of management in the target application makes text independence highly desirable. The main techniques studied were weighted and unweighted vector quantisation, mixture Gaussian VQ and ergodic continuous hidden Markov models (CHMM). Data from 25 speakers was acquired in several sessions, with five repetitions of each utterance in each session and an inter-session interval of one or more weeks. The overall results with between session training/test data showed that unweighted conventional VQ was inferior to variance weighted VQ, mixture Gaussian VQ and CHMM. The latter three techniques gave similar performances, achieving a recognition accuracy of about 97 to 98% with utterances from the training vocabulary. Short utterances from outside the training vocabulary gave a recognition accuracy of approximately 93%.

Full Paper

Bibliographic reference.  Zhu, Xiaoyuan / Gao, Yuqing / Ran, Shuping / Chen, Fangxin / Macleod, Iain / Millar, Bruce / Wagner, Michael (1994): "Text-independent speaker recognition using VQ, mixture Gaussian VQ and ergodic HMMs", In ASRIV-1994, 55-58.