ISCA Archive Odyssey 2016
ISCA Archive Odyssey 2016

Improving Robustness of Speaker Verification Against Mimicked Speech

Kuruvachan K George, C Santhosh Kumar, K I Ramachandran, Ashish Panda

Making speaker verification (SV) systems robust to spoofed/mimicked speech attacks is very important to make its use effective in security applications. In this work, we show that using a proximal support vector machine backend classifier with i-vectors as inputs (i-PSVM) can help improve the performance of SV systems for mimicked speech as non-target trials. We compared our results with the state-of-the-art baseline i-vector with cosine distance scoring (i-CDS), i-vector with a backend SVM classifier (i-SVM) and cosine distance features with an SVM backend classifier (CDF-SVM) systems. In all experiments with SVM backend classifier, we over sampled the target utterance feature vectors before i-vector extraction using utterance partition followed by acoustic vector resampling (UP-AVR). UP-AVR helps solve the data imbalance problem, with a large number of non-target examples from the development data for training the models. In i-PSVM, proximity of the test utterance to the target and non-target class is the criteria for decision making while in i-SVM, the distance from the separating hyperplane is the criteria for the decision. It was seen that the i-PSVM approach is advantageous when tested with mimicked speech as non-target trials. This highlights that proximity to the target speakers is a better criteria for speaker verification for mimicked speech. Further, we note that weighting the target and non-target class examples helps us further fine tune the performance of i-PSVM. We then devised a strategy for estimating the weights for every example based on its cosine distance similarity with respect to the centroid of target class examples. The final i-PSVM with example based weighting scheme achieved an improvement of 3.39% absolute in EER when compared to the best baseline system, i-SVM. Subsequently, we fused the i-PSVM and i-SVM systems and results show that the performance of the combined system is better than the individual systems.


doi: 10.21437/Odyssey.2016-35

Cite as: George, K.K., Kumar, C.S., Ramachandran, K.I., Panda, A. (2016) Improving Robustness of Speaker Verification Against Mimicked Speech. Proc. The Speaker and Language Recognition Workshop (Odyssey 2016), 245-251, doi: 10.21437/Odyssey.2016-35

@inproceedings{george16_odyssey,
  author={Kuruvachan K George and C Santhosh Kumar and K I Ramachandran and Ashish Panda},
  title={{Improving Robustness of Speaker Verification Against Mimicked Speech}},
  year=2016,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2016)},
  pages={245--251},
  doi={10.21437/Odyssey.2016-35}
}