ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints

Feng Dang, Pengyuan Zhang, Hangting Chen

Complex-domain models have achieved promising results for speech enhancement (SE) tasks. Some complex-domain models consider only time-frequency (T-F) domain constraints and do not take advantage of the information at the time-domain waveform level. Some complex-domain models consider only time-domain constraints and do not take into account T-F domain constraints that have rich harmonic structure information. Indeed some complex-domain models consider both time-domain and T-F domain constraints but only use the simple mean square loss as time-frequency-domain constraints. This paper proposes a complex-domain-based speech enhancement method that integrates time-domain constraints and T-F domain constraints into a unified framework using a Generative Adversarial Network (GAN). The proposed framework captures information at the time-domain waveform level features while paying attention to the harmonic structure by time-domain and T-F domain constraints. We conducted experiments on the Voice Bank + DEMAND dataset to evaluate the proposed method. Experimental results show that the proposed method improves the PESQ score by 0.09 and the STOI score by 1% over the strong baseline deep complex convolution recurrent network (DCCRN) and outperforms the state-of-the-art GAN-based SE systems.


doi: 10.21437/Interspeech.2021-1134

Cite as: Dang, F., Zhang, P., Chen, H. (2021) Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints. Proc. Interspeech 2021, 2721-2725, doi: 10.21437/Interspeech.2021-1134

@inproceedings{dang21_interspeech,
  author={Feng Dang and Pengyuan Zhang and Hangting Chen},
  title={{Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2721--2725},
  doi={10.21437/Interspeech.2021-1134}
}