Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions

Hao Zhang, Ke Tan, DeLiang Wang


We formulate acoustic echo and noise cancellation jointly as deep learning based speech separation, where near-end speech is separated from a single microphone recording and sent to the far end. We propose a causal system to address this problem, which incorporates a convolutional recurrent network (CRN) and a recurrent network with long short-term memory (LSTM). The system is trained to estimate the real and imaginary spectrograms of near-end speech and detect the activity of near-end speech from the microphone signal and far-end signal. Subsequently, the estimated real and imaginary spectrograms are used to separate the near-end signal, hence removing echo and noise. The trained near-end speech detector is employed to further suppress residual echo and noise. Evaluation results show that the proposed method effectively removes acoustic echo and background noise in the presence of nonlinear distortions for both simulated and measured room impulse responses (RIRs). Additionally, the proposed method generalizes well to untrained noises, RIRs and speakers.


 DOI: 10.21437/Interspeech.2019-2651

Cite as: Zhang, H., Tan, K., Wang, D. (2019) Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions. Proc. Interspeech 2019, 4255-4259, DOI: 10.21437/Interspeech.2019-2651.


@inproceedings{Zhang2019,
  author={Hao Zhang and Ke Tan and DeLiang Wang},
  title={{Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4255--4259},
  doi={10.21437/Interspeech.2019-2651},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2651}
}