Odyssey 2008: The Speaker and Language Recognition Workshop

Stellenbosch, South Africa
January 21-24, 2008

Phoneme and Sub-Phoneme T-Normalization for Text-Dependent Speaker Recognition

Doroteo T. Toledano (1), Cristina Esteve-Elizalde (1), Joaquin Gonzalez-Rodriguez (1), Ruben Fernandez Pozo (2), Luis Hernandez Gomez (2)

(1) ATVS Biometric Recognition Group, Universidad Autonoma de Madrid, Spain
(2) GAPS, SSR, Universidad Politecnica de Madrid, Spain

Test normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or textprompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm for text-dependent systems. It consists in applying score TNormalization at the phoneme or sub-phoneme level instead of at the sentence level. Experiments on the YOHO corpus show that, while using standard sentence-level T-Norm does not improve equal error rate (EER), phoneme and sub-phoneme level T-Norm produce a relative EER reduction of 18.9% and 20.1% respectively on a state-of-the-art HMM based textdependent speaker recognition system. Results are even better for working points with low false acceptance rates.

