INTERSPEECH 2004 - ICSLP
In speaker identification, most of the computation originates from distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we focus on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling also. We obtain a speed-up factor of 16:1 with VQ-based system, and 34:1 with GMM-based system with a minor degradation in the identification error rate.
Bibliographic reference. Frati, Pasi / Karpov, Evgeny / Kinnunen, Tomi (2004): "Real-time speaker identification", In INTERSPEECH-2004, 1805-1808.