8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Real-Time Speaker Identification

Pasi Frati, Evgeny Karpov, Tomi Kinnunen

University of Joensuu, Finland

In speaker identification, most of the computation originates from distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we focus on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling also. We obtain a speed-up factor of 16:1 with VQ-based system, and 34:1 with GMM-based system with a minor degradation in the identification error rate.

Full Paper

Bibliographic reference.  Frati, Pasi / Karpov, Evgeny / Kinnunen, Tomi (2004): "Real-time speaker identification", In INTERSPEECH-2004, 1805-1808.