Speaker recognition techniques have traditionally relied on purely acoustic features and models. During the last few years, however, the field of speaker recognition has started to show interest in the use of higher level features. In particular, phonetic decodings modeled with statistical language models (n-grams) have already shown its effectiveness in several research works. However, the relationship between phonetic modeling precision and the accuracy of phonetic speaker recognition has not yet been sufficiently analyzed. As part of our preparation for the NIST 2005 speaker recognition evaluation, we have performed a number of experiments that show that there is a negligible correlation between phonetic modeling precision and phonetic speaker recognition accuracy. Furthermore, our experimental results show that phonetic speaker recognition results may even be better when using phonetic decodings in languages different from that of the speech.
Cite as: Torre Toledano, D., Fombella, C., Gonzalez Rodriguez, J., Hernandez Gomez, L. (2005) On the relationship between phonetic modeling precision and phonetic speaker recognition accuracy. Proc. Interspeech 2005, 1993-1996, doi: 10.21437/Interspeech.2005-626
@inproceedings{torretoledano05_interspeech, author={Doroteo {Torre Toledano} and Carlos Fombella and Joaquin {Gonzalez Rodriguez} and Luis {Hernandez Gomez}}, title={{On the relationship between phonetic modeling precision and phonetic speaker recognition accuracy}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1993--1996}, doi={10.21437/Interspeech.2005-626} }