Complex-domain models have achieved promising results for speech enhancement (SE) tasks. Some complex-domain models consider only time-frequency (T-F) domain constraints and do not take advantage of the information at the time-domain waveform level. Some complex-domain models consider only time-domain constraints and do not take into account T-F domain constraints that have rich harmonic structure information. Indeed some complex-domain models consider both time-domain and T-F domain constraints but only use the simple mean square loss as time-frequency-domain constraints. This paper proposes a complex-domain-based speech enhancement method that integrates time-domain constraints and T-F domain constraints into a unified framework using a Generative Adversarial Network (GAN). The proposed framework captures information at the time-domain waveform level features while paying attention to the harmonic structure by time-domain and T-F domain constraints. We conducted experiments on the Voice Bank + DEMAND dataset to evaluate the proposed method. Experimental results show that the proposed method improves the PESQ score by 0.09 and the STOI score by 1% over the strong baseline deep complex convolution recurrent network (DCCRN) and outperforms the state-of-the-art GAN-based SE systems.
Cite as: Dang, F., Zhang, P., Chen, H. (2021) Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints. Proc. Interspeech 2021, 2721-2725, doi: 10.21437/Interspeech.2021-1134
@inproceedings{dang21_interspeech, author={Feng Dang and Pengyuan Zhang and Hangting Chen}, title={{Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={2721--2725}, doi={10.21437/Interspeech.2021-1134} }