Alternative techniques are evaluated for text independent speaker recognition in a speech activated menu navigation task, typical of windows-based interactive computing. Even though the vocabulary employed may be relatively small, ease of management in the target application makes text independence highly desirable. The main techniques studied were weighted and unweighted vector quantisation, mixture Gaussian VQ and ergodic continuous hidden Markov models (CHMM). Data from 25 speakers was acquired in several sessions, with five repetitions of each utterance in each session and an inter-session interval of one or more weeks. The overall results with between session training/test data showed that unweighted conventional VQ was inferior to variance weighted VQ, mixture Gaussian VQ and CHMM. The latter three techniques gave similar performances, achieving a recognition accuracy of about 97 to 98% with utterances from the training vocabulary. Short utterances from outside the training vocabulary gave a recognition accuracy of approximately 93%.
Cite as: Zhu, X., Gao, Y., Ran, S., Chen, F., Macleod, I., Millar, B., Wagner, M. (1994) Text-independent speaker recognition using VQ, mixture Gaussian VQ and ergodic HMMs. Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, 55-58
@inproceedings{zhu94_asriv, author={Xiaoyuan Zhu and Yuqing Gao and Shuping Ran and Fangxin Chen and Iain Macleod and Bruce Millar and Michael Wagner}, title={{Text-independent speaker recognition using VQ, mixture Gaussian VQ and ergodic HMMs}}, year=1994, booktitle={Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification}, pages={55--58} }