Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech

Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai.-Doss, Hema Murthy, Shrikanth Narayanan


This paper presents a raw-waveform neural network and uses it along with a denoising network for clustering in weakly-supervised learning scenarios under extreme noise conditions. Specifically, we consider language independent gender identification on a set of varied noise conditions and signal to noise ratios (SNRs). We formulate the denoising problem as a source separation task and train the system using a discriminative criterion in order to enhance output SNRs. A denoising recurrent neural network (RNN) is first trained on a small subset (roughly one-fifth) of the data for learning a speech-specific mask. The denoised speech signal is then directly fed as input to a raw-waveform convolutional neural network (CNN) trained with denoised speech. We evaluate the standalone performance of denoiser in terms of various signal-to-noise measures and discuss its contribution towards robust gender identification. An absolute improvement of 11.06% and 13.33% is achieved by the combined pipeline over the i-vector SVM baseline system for 0 dB and -5 dB SNR conditions, respectively. We further analyse the information captured by the first CNN layer in both noisy and denoised speech.


 DOI: 10.21437/Interspeech.2018-2321

Cite as: Sebastian, J., Kumar, M., D. S., P.K., Magimai.-Doss, M., Murthy, H., Narayanan, S. (2018) Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech. Proc. Interspeech 2018, 292-296, DOI: 10.21437/Interspeech.2018-2321.


@inproceedings{Sebastian2018,
  author={Jilt Sebastian and Manoj Kumar and Pavan Kumar {D. S.} and Mathew Magimai.-Doss and Hema Murthy and Shrikanth Narayanan},
  title={Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={292--296},
  doi={10.21437/Interspeech.2018-2321},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2321}
}