Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus

Md Jahangir Alam, Patrick Kenny, Vishwa Gupta


We use tandem features and a fusion of four systems for text-dependent speaker verification on the RedDots corpus. In the tandem system, a senone-discriminant neural network provides a low-dimensional bottleneck feature at each frame which are concatenated with a standard Mel-frequency cepstral coefficients (MFCC) feature representation. The concatenated features are propagated to a conventional GMM/UBM speaker recognition framework. In order to capture complementary information to the MFCC, we also use linear frequency cepstral coefficients and wavelet-based cepstral coefficients features for score level fusion. We report results on the part 1 and part 4 (text-dependent) tasks of RedDots corpus. Both the tandem feature-based system and fused system provided significant improvements over the baseline GMM/UBM system in terms of equal error rates (EER) and detection cost functions (DCFs) as defined in the 2008 and 2010 NIST speaker recognition evaluations. On the part 1 task (impostor correct condition) the fused system reduced the EER from 2.63% to 2.28% for male trials and from 7.01% to 3.48% for female trials. On the part4 task (impostor correct condition) the fused system helped to reduce the EER from 2.49% to 1.96% and from 5.9% to 3.22% for male and female trials respectively.


DOI: 10.21437/Interspeech.2016-1465

Cite as

Alam, M.J., Kenny, P., Gupta, V. (2016) Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus. Proc. Interspeech 2016, 420-424.

Bibtex
@inproceedings{Alam+2016,
author={Md Jahangir Alam and Patrick Kenny and Vishwa Gupta},
title={Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1465},
url={http://dx.doi.org/10.21437/Interspeech.2016-1465},
pages={420--424}
}