Denoising x-vectors for Robust Speaker Recognition

Mohammad Mohammadamini, Driss Matrouf, Paul-Gauthier Noé


Using deep learning methods has led to significant improvement in speaker recognition systems. Introducing x-vectors as a speaker modeling method has made these systems more robust. Since, in challenging environments with noise and reverberation, the performance of x-vectors systems degrades significantly, the demand for denoising techniques remains as before. In this paper, for the first time, we try to denoise the x-vectors speaker embedding. Firstly, we use the i-MAP method which considers that both noise and clean x-vectors have a Gaussian distribution. Then, leveraging denoising autoencoders (DAE) we try to reconstruct the clean x-vector from the corrupted version. After that, we propose two hybrid systems composed of statistical i-MAP and DAE. Finally, we propose a novel DAE architecture, named Deep Stacked DAE, composed of several DAEs where each DAE receives as input the output of its predecessor DAE concatenated with the difference between noisy x-vectors and its predecessor's output. The experiments on Fabiol corpus show that the results given by the hybrid DAE i-MAP method in several cases outperforms the conventional DAE and i-MAP methods. Also, the results for Deep Stacked DAE in most cases is better than the other proposed methods. For utterances longer than 12 seconds we achieved a 51% improvement in terms of EER with Deep Stacked DAE, and for utterances shorter than 2 seconds, Deep Stacked DAE gives 18% improvements compared to the baseline system.


 DOI: 10.21437/Odyssey.2020-11

Cite as: Mohammadamini, M., Matrouf, D., Noé, P. (2020) Denoising x-vectors for Robust Speaker Recognition. Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 75-80, DOI: 10.21437/Odyssey.2020-11.


@inproceedings{Mohammadamini2020,
  author={Mohammad Mohammadamini and Driss Matrouf and Paul-Gauthier Noé},
  title={{Denoising x-vectors for Robust Speaker Recognition}},
  year=2020,
  booktitle={Proc. Odyssey 2020 The Speaker and Language Recognition Workshop},
  pages={75--80},
  doi={10.21437/Odyssey.2020-11},
  url={http://dx.doi.org/10.21437/Odyssey.2020-11}
}