INTERSPEECH 2006 - ICSLP
We present a speaker recognition system with multiple GMM tokenizers as the front-end, and vector space modeling as the back-end classifier. GMM tokenizer captures the acoustic and phonetic characteristics of a speaker from the speech without the need of phonetic transcription. To enhance the speaker characteristics coverage and provide more discriminative information, a speaker clustering algorithm is proposed to build multiple GMM tokenizers that are arranged in parallel. For an input utterance, each of the tokenizers outputs a token sequence, which is then represented by a vector of n-gram probabilities. Multiple vectors are concatenated to form a composite vector. Finally the Support Vector Machine (SVM) is used as the back-end classifier of the composite vectors. We use the 2002 NIST Speaker Recognition Evaluation (SRE) corpus for training GMM tokenizers and background modeling, and evaluate on the 2001 NIST SRE corpus.
Bibliographic reference. Ma, Bin / Zhu, Donglai / Tong, Rong / Li, Haizhou (2006): "Speaker cluster based GMM tokenization for speaker recognition", In INTERSPEECH-2006, paper 1429-Mon3A1O.4.