This paper proposes a novel framework to improve performance of gender independent i-Vector PLDA based speaker recognition using convolutional neural network (CNN). Convolutional layers of a CNN offer robustness to variations in input features including those due to gender. A CNN is trained for ASR with a linear bottleneck layer. Bottleneck features extracted using the CNN are then used to train a gender-independent UBM to obtain frame posteriors for training an i-Vector extractor matrix. To preserve speaker specific information, a hybrid approach to training the i-Vector extractor matrix using MFCC features with corresponding frame posteriors derived from bottleneck features is proposed. On the NIST SRE10 C5 condition pooled trials, our approach reduces the EER and minDCF 2010 by +14.62% and +14.42% respectively compared to a standard mfcc based gender-independent speaker recognition system.
Cite as: Ranjan, S., Hansen, J.H.L. (2017) Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features. Proc. Interspeech 2017, 1009-1013, doi: 10.21437/Interspeech.2017-1182
@inproceedings{ranjan17_interspeech, author={Shivesh Ranjan and John H.L. Hansen}, title={{Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1009--1013}, doi={10.21437/Interspeech.2017-1182} }