ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Adversarial Network Bottleneck Features for Noise Robust Speaker Verification

Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo

In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition.

doi: 10.21437/Interspeech.2017-883

Cite as: Yu, H., Tan, Z.-H., Ma, Z., Guo, J. (2017) Adversarial Network Bottleneck Features for Noise Robust Speaker Verification. Proc. Interspeech 2017, 1492-1496, doi: 10.21437/Interspeech.2017-883

  author={Hong Yu and Zheng-Hua Tan and Zhanyu Ma and Jun Guo},
  title={{Adversarial Network Bottleneck Features for Noise Robust Speaker Verification}},
  booktitle={Proc. Interspeech 2017},