Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement

Prashanth Gurunath Shivakumar, Panayiotis Georgiou


Speech Enhancement is a challenging and important area of research due to the many applications that depend on improved signal quality. It is a pre-processing step of speech processing systems and used for perceptually improving quality of speech for humans. With recent advances in Deep Neural Networks (DNN), deep Denoising Auto-Encoders have proved to be very successful for speech enhancement. In this paper, we propose a novel objective loss function, which takes into account the perceptual quality of speech. We use that to train Perceptually-Optimized Speech Denoising Auto-Encoders (POS-DAE). We demonstrate the effectiveness of POS-DAE in a speech enhancement task. Further we introduce a two level DNN architecture for denoising and enhancement. We show the effectiveness of the proposed methods for a high noise subset of the QUT-NOISE-TIMIT database under mismatched noise conditions. Experiments are conducted comparing the POS-DAE against the Mean Square Error loss function using speech distortion, noise reduction and Perceptual Evaluation of Speech Quality. We find that the proposed loss function and the new 2-stage architecture give significant improvements in perceptual speech quality measures and the improvements become more significant for higher noise conditions.


DOI: 10.21437/Interspeech.2016-1284

Cite as

Shivakumar, P.G., Georgiou, P. (2016) Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement. Proc. Interspeech 2016, 3743-3747.

Bibtex
@inproceedings{Shivakumar+2016,
author={Prashanth Gurunath Shivakumar and Panayiotis Georgiou},
title={Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1284},
url={http://dx.doi.org/10.21437/Interspeech.2016-1284},
pages={3743--3747}
}