Causal Speech Enhancement Combining Data-Driven Learning and Suppression Rule Estimation

Seyedmahdad Mirsamadi, Ivan Tashev


The problem of single-channel speech enhancement has been traditionally addressed by using statistical signal processing algorithms that are designed to suppress time-frequency regions affected by noise. We study an alternative data-driven approach which uses deep neural networks (DNNs) to learn the transformation from noisy and reverberant speech to clean speech, with a focus on real-time applications which require low-latency causal processing. We examine several structures in which deep learning can be used within an enhancement system. These include end-to-end DNN regression from noisy to clean spectra, as well as less intervening approaches which estimate a suppression gain for each time-frequency bin instead of directly recovering the clean spectral features. We also propose a novel architecture in which the general structure of a conventional noise suppressor is preserved, but the sub-tasks are independently learned and carried out by separate networks. It is shown that DNN-based suppression gain estimation outperforms the regression approach in the causal processing mode and for noise types that are not seen during DNN training.


DOI: 10.21437/Interspeech.2016-437

Cite as

Mirsamadi, S., Tashev, I. (2016) Causal Speech Enhancement Combining Data-Driven Learning and Suppression Rule Estimation. Proc. Interspeech 2016, 2870-2874.

Bibtex
@inproceedings{Mirsamadi+2016,
author={Seyedmahdad Mirsamadi and Ivan Tashev},
title={Causal Speech Enhancement Combining Data-Driven Learning and Suppression Rule Estimation},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-437},
url={http://dx.doi.org/10.21437/Interspeech.2016-437},
pages={2870--2874}
}