8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Spectral Subtraction with Full-Wave Rectification and Likelihood Controlled Instantaneous Noise Estimation for Robust Speech Recognition

Haitian Xu, Zheng-Hua Tan, Paul Dalsgaard, Borge Lindberg

Aalborg University, Denmark

In standard Spectral Subtraction (SS), Half-Wave Rectification SS (HWR-SS) is normally applied to avoid negative values in the Power Spectral Density (PSD) that occur mainly due to inaccurate noise estimation caused by a Voice Activity Detector (VAD). In this paper analyses show that, given accurate noise estimation, the phase relationship between speech and noise becomes the dominant cause of the negative values. Full-Wave Rectification based SS (FWR-SS) combined with Instantaneous Noise Estimation (INE) is therefore proposed to be applied instead of VAD based HWR-SS as it is better capable of maintaining the speech information in those negative values. It is also shown in the paper that FWR-SS provides optimum orthogonality between the estimated noise and speech signals. The INE method proposed in this paper is Likelihood Controlled Instantaneous Noise Estimation (LCINE), which combines long-term statistical characteristics of noise resulting from a VAD with a method of short-term INE. The combination of FWR-SS and LCINE is computationally efficient and shows a 51% error rate reduction on the Aurora 2 database in comparison to the basic Aurora front-end provided by ETSI [1].

