This paper proposes i-vector transformations using neural networks for achieving noise-robust speaker recognition. A novel discriminative denoising autoencoder (DDAE) is employed on i-vectors to remove additive noise effects. The DDAE is trained to denoise and classify noisy i-vectors simultaneously, making it possible to add discriminability to the denoised i-vectors. Speaker recognition experiments on the NIST SRE 2012 task shows 32% better error performance as compared to a baseline system. Also, our proposed method outperforms such conventional methods as multi-condition training and a basic denoising autoencoder.
Cite as: Mahto, S., Yamamoto, H., Koshinaka, T. (2017) i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition. Proc. Interspeech 2017, 3722-3726, doi: 10.21437/Interspeech.2017-731
@inproceedings{mahto17_interspeech, author={Shivangi Mahto and Hitoshi Yamamoto and Takafumi Koshinaka}, title={{i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3722--3726}, doi={10.21437/Interspeech.2017-731} }