9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

MAP and Sub-Word Level T-Norm for Text-Dependent Speaker Recognition

Doroteo T. Toledano (1), Daniel Hernandez-Lopez (1), Cristina Esteve-Elizalde (1), Joaquin Gonzalez-Rodriguez (1), Ruben Fernandez Pozo (2), Luis Hernandez Gomez (2)

(1) Universidad Autónoma de Madrid, Spain (2) Universidad Politècnica de Madrid, Spain

This paper presents improvements in text-dependent speaker recognition based on the use of Maximum A Posteriori (MAP) adaptation of Hidden Markov Models and the use of new sub-word level T-Normalization procedures. Results on the YOHO corpus show that the use of MAP adaptation provides a relative improvement of 22.6% in Equal Error Rate (EER) in comparison with Baum-Welch retraining and Maximum Likelihood Linear Regression (MLLR) adaptation. The newly proposed sub-word level T-Normalization procedures provide additional relative improvements, particularly for small cohorts, of up to 20% in EER in comparison with the normal utterance-level T-Normalization.

Full Paper

Bibliographic reference.  Toledano, Doroteo T. / Hernandez-Lopez, Daniel / Esteve-Elizalde, Cristina / Gonzalez-Rodriguez, Joaquin / Pozo, Ruben Fernandez / Gomez, Luis Hernandez (2008): "MAP and sub-word level t-norm for text-dependent speaker recognition", In INTERSPEECH-2008, 1933-1936.