In speaker recognition (SRE), the commonly used feature vector is basic ceptral coefficients concatenating with their delta and double delta cepstal features. This configuration is borrowed from speech recognition and may be not optimal for SRE. In this paper, we propose a variant time-frequency cepstral (TFC) features, which is based on our previous work for language recognition. The feature vector is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a specific area with large variances. Different shapes and parameters are tested and the optimal configuration is obtained. Experimental results on the 2008 NIST speaker recognition evaluation short2 telephone-short3 telephone test set show that the proposed variant TFC is more effective than the conventional feature vectors.
Bibliographic reference. Zhang, Wei-Qiang / Deng, Yan / He, Liang / Liu, Jia (2010): "Variant time-frequency cepstral features for speaker recognition", In INTERSPEECH-2010, 2122-2125.