Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction

Ju Lin, Sufeng Niu, Zice Wei, Xiang Lan, Adriaan J. van Wijngaarden, Melissa C. Smith, Kuang-Ching Wang


Speech enhancement techniques that use a generative adversarial network (GAN) can effectively suppress noise while allowing models to be trained end-to-end. However, such techniques directly operate on time-domain waveforms, which are often highly-dimensional and require extensive computation. This paper proposes a novel GAN-based speech enhancement method, referred to as S-ForkGAN, that operates on log-power spectra rather than on time-domain speech waveforms, and uses a forked GAN structure to extract both speech and noise information. By operating on log-power spectra, one can seamlessly include conventional spectral subtraction techniques, and the parameter space typically has a lower dimension. The performance of S-ForkGAN is assessed for automatic speech recognition (ASR) using the TIMIT data set and a wide range of noise conditions. It is shown that S-ForkGAN outperforms existing GAN-based techniques and that it has a lower complexity.


 DOI: 10.21437/Interspeech.2019-2954

Cite as: Lin, J., Niu, S., Wei, Z., Lan, X., Wijngaarden, A.J.V., Smith, M.C., Wang, K. (2019) Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction. Proc. Interspeech 2019, 3163-3167, DOI: 10.21437/Interspeech.2019-2954.


@inproceedings{Lin2019,
  author={Ju Lin and Sufeng Niu and Zice Wei and Xiang Lan and Adriaan J. van Wijngaarden and Melissa C. Smith and Kuang-Ching Wang},
  title={{Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3163--3167},
  doi={10.21437/Interspeech.2019-2954},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2954}
}