MAP and sub-word level t-norm for text-dependent speaker recognition

Doroteo T. Toledano, Daniel Hernandez-Lopez, Cristina Esteve-Elizalde, Joaquin Gonzalez-Rodriguez, Ruben Fernandez Pozo, Luis Hernandez Gomez

This paper presents improvements in text-dependent speaker recognition based on the use of Maximum A Posteriori (MAP) adaptation of Hidden Markov Models and the use of new sub-word level T-Normalization procedures. Results on the YOHO corpus show that the use of MAP adaptation provides a relative improvement of 22.6% in Equal Error Rate (EER) in comparison with Baum-Welch retraining and Maximum Likelihood Linear Regression (MLLR) adaptation. The newly proposed sub-word level T-Normalization procedures provide additional relative improvements, particularly for small cohorts, of up to 20% in EER in comparison with the normal utterance-level T-Normalization.

doi: 10.21437/Interspeech.2008-511

Cite as: Toledano, D.T., Hernandez-Lopez, D., Esteve-Elizalde, C., Gonzalez-Rodriguez, J., Pozo, R.F., Gomez, L.H. (2008) MAP and sub-word level t-norm for text-dependent speaker recognition. Proc. Interspeech 2008, 1933-1936, doi: 10.21437/Interspeech.2008-511

