SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement

Szu-Wei Fu, Yu Tsao, Xugang Lu


This paper proposes a signal-to-noise-ratio (SNR) aware convolutional neural network (CNN) model for speech enhancement (SE). Because the CNN model can deal with local temporal-spectral structures of speech signals, it can effectively disentangle the speech and noise signals given the noisy speech signals. In order to enhance the generalization capability and accuracy, we propose two SNR-aware algorithms for CNN modeling. The first algorithm employs a multi-task learning (MTL) framework, in which restoring clean speech and estimating SNR level are formulated as the main and the secondary tasks, respectively, given the noisy speech input. The second algorithm is an SNR adaptive denoising, in which the SNR level is explicitly predicted in the first step, and then an SNR-dependent CNN model is selected for denoising. Experiments were carried out to test the two SNR-aware algorithms for CNN modeling. Results demonstrate that CNN with the two proposed SNR-aware algorithms outperform the deep neural network counterpart in terms of standardized objective evaluations when using the same number of layers and nodes. Moreover, the SNR-aware algorithms can improve the denoising performance with unseen SNR levels, suggesting their promising generalization capability for real-world applications.


DOI: 10.21437/Interspeech.2016-211

Cite as

Fu, S., Tsao, Y., Lu, X. (2016) SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement. Proc. Interspeech 2016, 3768-3772.

Bibtex
@inproceedings{Fu+2016,
author={Szu-Wei Fu and Yu Tsao and Xugang Lu},
title={SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-211},
url={http://dx.doi.org/10.21437/Interspeech.2016-211},
pages={3768--3772}
}