We present an approach to speaker recognition in the textindependent domain of conversational telephone speech using a text-constrained system designed to employ select highfrequency keywords in the speech stream. The system uses speaker word models generated via Hidden Markov Models (HMMs) - a departure from the traditional Gaussian Mixture Model (GMM) approach dominant in text-independent work, but commonly employed in text-dependent systems - with the expectation that HMMs take greater advantage of sequential information and support more detailed modeling which could be used to aid recognition. Even with a keyword inventory that covers a mere 10% of the word tokens and a system that does not yet incorporate many standard speaker recognition normalization schemes, this approach is already achieving equal error rates of 1% on NISTÂ’s 2001 Extended Data task.
Cite as: Boakye, K., Peskin, B. (2004) Text-constrained speaker recognition on a text-independent task. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 129-134
@inproceedings{boakye04_odyssey, author={Kofi Boakye and Barbara Peskin}, title={{Text-constrained speaker recognition on a text-independent task}}, year=2004, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)}, pages={129--134} }