Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN

Nagaraj Adiga, Yannis Pantazis, Vassilis Tsiaras, Yannis Stylianou

The quality of speech synthesis systems can be significantly deteriorated by the presence of background noise in the recordings. Despite the existence of speech enhancement techniques for effectively suppressing additive noise under low signal-to-noise (SNR) conditions, these techniques have been neither designed nor tested in speech synthesis tasks where background noise has relatively lower energy. In this paper, we propose a speech enhancement technique based on generative adversarial networks (GANs) which acts as a preprocessing step of speech synthesis. Motivated by the speech enhancement generative adversarial network (SEGAN) approach and recent advances in deep learning, we propose to use Wasserstein GAN (WGAN) with gradient penalty and gated activation functions to the autoencoder network of SEGAN. We studied the impact of the proposed method on a data set consisting of 28 speakers and different noise types with 3 different SNR level. The effectiveness of the proposed method in the context of speech synthesis is demonstrated through the training of WaveNet vocoder. We compare our method against SEGAN. Both subjective and objective metrics confirm that the proposed speech enhancement approach outperforms SEGAN in terms of speech synthesis quality.

 DOI: 10.21437/Interspeech.2019-2648

Cite as: Adiga, N., Pantazis, Y., Tsiaras, V., Stylianou, Y. (2019) Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN. Proc. Interspeech 2019, 1821-1825, DOI: 10.21437/Interspeech.2019-2648.

  author={Nagaraj Adiga and Yannis Pantazis and Vassilis Tsiaras and Yannis Stylianou},
  title={{Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN}},
  booktitle={Proc. Interspeech 2019},