Adversarial Learning and Augmentation for Speaker Recognition

Jen-Tzung Chien, Kang-Ting Peng


This paper develops a new generative adversarial network (GAN) to artificially generate i-vectors to deal with the issue of unbalanced or insufficient data in speaker recognition based on the probabilistic linear discriminant analysis (PLDA). Data augmentation is performed to improve system robustness over the variations of i-vectors under different number of training utterances. Our idea is to incorporate the class label into GAN which involves a minimax optimization problem for adversarial learning. We build a generator and a discriminator where the class conditional i-vectors are produced by the generator such that the discriminator can not distinguish them as the fake samples. In particular, multiple learning objectives are optimized to build a specialized deep model for model regularization in speaker recognition. In addition to the minimax optimization of adversarial loss, the posterior probabilities of class labels given real and fake samples are maximized. The cosine similarity between real and fake i-vectors is also minimized to preserve the quality of the generated i-vector. The loss functions for data reconstruction and Gaussian regularization in PLDA model are minimized. The experiments illustrate the merit of multi-objective learning for deep adversarial augmentation for speaker recognition.


 DOI: 10.21437/Odyssey.2018-48

Cite as: Chien, J., Peng, K. (2018) Adversarial Learning and Augmentation for Speaker Recognition . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 342-348, DOI: 10.21437/Odyssey.2018-48.


@inproceedings{Chien2018,
  author={Jen-Tzung Chien and Kang-Ting Peng},
  title={Adversarial Learning and Augmentation for Speaker Recognition	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={342--348},
  doi={10.21437/Odyssey.2018-48},
  url={http://dx.doi.org/10.21437/Odyssey.2018-48}
}