Due to the limitations of power amplifiers or loudspeakers, the echo signals captured in the microphones are not in a linear relationship with the far-end signals even when the echo path is perfectly linear. The nonlinear components of the echo cannot be successfully removed by a linear acoustic echo canceller. Residual echo suppression (RES) is a technique to suppress the remained echo after acoustic echo suppression (AES). Conventional approaches compute RES gain using Wiener filter or spectral subtraction method based on the estimated statistics on related signals. In this paper, we propose a deep neural network (DNN)-based RES gain estimation based on both the far-end and the AES output signals in all frequency bins. A DNN architecture, which is suitable to model a complicated nonlinear mapping between high-dimensional vectors, is employed as a regression function from these signals to the optimal RES gain. The proposed method can suppress the residual components without any explicit double-talk detectors. The experimental results show that our proposed approach outperforms a conventional method in terms of the echo return loss enhancement (ERLE) for single-talk periods and the perceptual evaluation of speech quality (PESQ) score for double-talk periods.
Bibliographic reference. Lee, Chul Min / Shin, Jong Won / Kim, Nam Soo (2015): "DNN-based residual echo suppression", In INTERSPEECH-2015, 1775-1779.