In this paper, we propose to use Vocal Tract Length Normalization (VTLN) to build the Universal Background Model (UBM) for a closed set speaker identification system. Vocal Tract Length (VTL) differences among speakers is a major source of variability in the speech signal. Since the UBM model is trained using data from many speakers, it statistically captures this inherent variation in the speech signal, which results in a “coarse” model in the acoustic space. This may cause the adapted speaker models obtained from the UBM model to have significantly high overlap in the acoustic space. We hypothesize that the use of VTLN will help in compacting the UBM model and thus the speaker adapted models obtained from this compact model will have better speaker-separability in the acoustic space. We perform experiments on MIT, TIMIT and NIST 2004 SRE databases and show that using VTLN we can achieve lesser Identification Error Rates as compared to the conventional GMM-UBM based method.
Bibliographic reference. Sarkar, A. K. / Umesh, S. / Rath, S. P. (2009): "Text-independent speaker identification using vocal tract length normalization for building universal background model", In INTERSPEECH-2009, 2331-2334.