Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios

Hao Zhang, DeLiang Wang


Traditional acoustic echo cancellation (AEC) works by identifying an acoustic impulse response using adaptive algorithms. We formulate AEC as a supervised speech separation problem, which separates the loudspeaker signal and the near-end signal so that only the latter is transmitted to the far end. A recurrent neural network with bidirectional long short-term memory (BLSTM) is trained to estimate the ideal ratio mask from features extracted from the mixtures of near-end and far-end signals. A BLSTM estimated mask is then applied to separate and suppress the far-end signal, hence removing the echo. Experimental results show the effectiveness of the proposed method for echo removal in double-talk, background noise and nonlinear distortion scenarios. In addition, the proposed method can be generalized to untrained speakers.


 DOI: 10.21437/Interspeech.2018-1484

Cite as: Zhang, H., Wang, D. (2018) Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios. Proc. Interspeech 2018, 3239-3243, DOI: 10.21437/Interspeech.2018-1484.


@inproceedings{Zhang2018,
  author={Hao Zhang and DeLiang Wang},
  title={Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3239--3243},
  doi={10.21437/Interspeech.2018-1484},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1484}
}