Avoiding Speaker Overfitting in End-to-End DNNs Using Raw Waveform for Text-Independent Speaker Verification

Jee-weon Jung, Hee-soo Heo, IL-ho Yang, Hye-jin Shim, Ha-jin Yu


In this research, we propose a novel raw waveform end-to-end DNNs for text-independent speaker verification. For speaker verification, many studies utilize the speaker embedding scheme, which trains deep neural networks as speaker identifiers to extract speaker features. However, this scheme has an intrinsic limitation in which the speaker feature, trained to classify only known speakers, is required to represent the identity of unknown speakers. Owing to this mismatch, speaker embedding systems tend to well generalize towards unseen utterances from known speakers, but are overfitted to known speakers. This phenomenon is referred to as speaker overfitting. In this paper, we investigated regularization techniques, a multi-step training scheme and a residual connection with pooling layers in the perspective of mitigating speaker overfitting which lead to considerable performance improvements. Technique effectiveness is evaluated using the VoxCeleb dataset, which comprises over 1,200 speakers from various uncontrolled environments. To the best of our knowledge, we are the first to verify the success of end-to-end DNNs directly using raw waveforms in text-independent scenario. It shows an equal error rate of 7.4%, which is lower than i-vector/probabilistic linear discriminant analysis and end-to-end DNNs that use spectrograms.


 DOI: 10.21437/Interspeech.2018-1608

Cite as: Jung, J., Heo, H., Yang, I., Shim, H., Yu, H. (2018) Avoiding Speaker Overfitting in End-to-End DNNs Using Raw Waveform for Text-Independent Speaker Verification. Proc. Interspeech 2018, 3583-3587, DOI: 10.21437/Interspeech.2018-1608.


@inproceedings{Jung2018,
  author={Jee-weon Jung and Hee-soo Heo and IL-ho Yang and Hye-jin Shim and Ha-jin Yu},
  title={Avoiding Speaker Overfitting in End-to-End DNNs Using Raw Waveform for Text-Independent Speaker Verification},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3583--3587},
  doi={10.21437/Interspeech.2018-1608},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1608}
}