15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Combining Source and System Information for Limited Data Speaker Verification

Rohan Kumar Das (1), S. Abhiram (2), S. R. M. Prasanna (1), A. G. Ramakrishnan (2)

(1) IIT Guwahati, India
(2) Indian Institute of Science, India

Speaker verification using limited data is always a challenge for practical implementation as an application. An analysis on speaker verification studies for an i-vector based method using Mel-Frequency Cepstral Coefficient (MFCC) feature shows that the performance drops drastically as the duration of test data is reduced. This decrease in performance is due to insufficient phonetic coverage when we capture only the vocal tract feature. However the same can be improved if some source characteristics are taken into consideration. This paper attempts to improve the speaker verification performance using source characteristics. A recently proposed characterization of the voice source signal called the discrete cosine transform of the integrated linear prediction residual (DCTILPR) has been found to be useful as a speaker-specific feature. Speaker verification is performed over short test utterances in the NIST 2003 database using both the DCTILPR and MFCC features, and their score-level combination is found to give a significant performance improvement over the system using only the MFCC features.

Full Paper

Bibliographic reference.  Das, Rohan Kumar / Abhiram, S. / Prasanna, S. R. M. / Ramakrishnan, A. G. (2014): "Combining source and system information for limited data speaker verification", In INTERSPEECH-2014, 1836-1840.