ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

SE-Conformer: Time-Domain Speech Enhancement Using Conformer

Eesung Kim, Hyeji Seo

Convolution-augmented transformer (conformer) has recently shown competitive results in speech-domain applications, such as automatic speech recognition, continuous speech separation, and sound event detection. Conformer can capture both the short and long-term temporal sequence information by attending to the whole sequence at once with multi-head self-attention and convolutional neural network. However, the effectiveness of conformer in speech enhancement has not been demonstrated. In this paper, we propose an end-to-end speech enhancement architecture (SE-Conformer), incorporating a convolutional encoder–decoder and conformer, designed to be directly applied to the time-domain signal. We performed evaluations on both the VoiceBank-DEMAND Corpus (VCTK) and Librispeech datasets in terms of objective speech quality metrics. The experimental results show that the proposed model outperforms other competitive baselines in speech enhancement performance.

doi: 10.21437/Interspeech.2021-2207

Cite as: Kim, E., Seo, H. (2021) SE-Conformer: Time-Domain Speech Enhancement Using Conformer. Proc. Interspeech 2021, 2736-2740, doi: 10.21437/Interspeech.2021-2207

  author={Eesung Kim and Hyeji Seo},
  title={{SE-Conformer: Time-Domain Speech Enhancement Using Conformer}},
  booktitle={Proc. Interspeech 2021},