Two-Stage Temporal Processing for Single-Channel Speech Enhancement

Suman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh

Most of the conventional speech enhancement methods operating in the spectral domain often suffer from spurious artifact called musical noise. Moreover, these methods also incur an extra overhead time for noise power spectral density estimation. In this paper, a speech enhancement framework is proposed by cascading two temporal processing stages. The first stage performs excitation source based temporal processing that involves identifying and boosting the excitation source based speech-specific features present at the gross and fine temporal levels, whereas the second stage provides noise reduction by estimating standard deviation of noise in time-domain by using a robust estimator. The proposed noise reduction stage is quite simply implementable and computationally less complex as it does not require noise estimation in spectral domain as a pre-processing phase. The experimental results have established that the proposed scheme produces on an average 60–65% improvement in the speech quality (PESQ scores) and intelligibility (STOI scores) at 0 and -5 dB input SNR when compared to existing standard approaches.

DOI: 10.21437/Interspeech.2016-307

Cite as

Samui, S., Chakrabarti, I., Ghosh, S.K. (2016) Two-Stage Temporal Processing for Single-Channel Speech Enhancement. Proc. Interspeech 2016, 3723-3727.

author={Suman Samui and Indrajit Chakrabarti and Soumya Kanti Ghosh},
title={Two-Stage Temporal Processing for Single-Channel Speech Enhancement},
booktitle={Interspeech 2016},