ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement

Qiquan Zhang, Qi Song, Aaron Nicolson, Tian Lan, Haizhou Li

Despite much progress, most temporal convolutional networks (TCN) based speech enhancement models are mainly focused on modeling the long-term temporal contextual dependencies of speech frames, without taking into account the distribution information of speech signal in frequency dimension. In this study, we propose a frequency dimension adaptive attention (FAA) mechanism to improve TCNs, which guides the model selectively emphasize the frequency-wise features with important speech information and also improves the representation capability of network. Our extensive experimental investigation demonstrates that the proposed FAA mechanism is able to consistently provide significant improvements in terms of speech quality (PESQ), intelligibility (STOI) and three other composite metrics. More promisingly, it has better generalization ability to real-world noisy environment.


doi: 10.21437/Interspeech.2021-46

Cite as: Zhang, Q., Song, Q., Nicolson, A., Lan, T., Li, H. (2021) Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement. Proc. Interspeech 2021, 166-170, doi: 10.21437/Interspeech.2021-46

@inproceedings{zhang21b_interspeech,
  author={Qiquan Zhang and Qi Song and Aaron Nicolson and Tian Lan and Haizhou Li},
  title={{Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={166--170},
  doi={10.21437/Interspeech.2021-46}
}