ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Combining deep speaker specific representations with GMM-SVM for speaker verification

Ryan Price, Sangeeta Biswas, Koichi Shinoda

This study combines a Gaussian mixture model support vector machine (GMM-SVM) system with a nonlinear feature transformation, discriminatively trained to extract speaker specific features from MFCCs. Separation of the speaker information component and non-speaker related information in the speech signal is accomplished using a regularized siamese deep network (RSDN). RSDN learns a hidden representation that well characterizes speaker information by training a subset of the hidden units using pairs of speech segments. MFCC features are input to a trained RSDN and a subset of hidden layer outputs are used as new input features in a GMM-SVM system. We demonstrate the potential of this approach for text-independent speaker verification by applying it to a subset of the NIST SRE 2006 1conv4w-1conv4w task. The hybrid RSDN GMM-SVM system achieves about 5% relative improvement over the baseline GMM-SVM system.


doi: 10.21437/Interspeech.2013-638

Cite as: Price, R., Biswas, S., Shinoda, K. (2013) Combining deep speaker specific representations with GMM-SVM for speaker verification. Proc. Interspeech 2013, 2788-2792, doi: 10.21437/Interspeech.2013-638

@inproceedings{price13_interspeech,
  author={Ryan Price and Sangeeta Biswas and Koichi Shinoda},
  title={{Combining deep speaker specific representations with GMM-SVM for speaker verification}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2788--2792},
  doi={10.21437/Interspeech.2013-638}
}