INTERSPEECH 2004 - ICSLP
In standard Spectral Subtraction (SS), Half-Wave Rectification SS (HWR-SS) is normally applied to avoid negative values in the Power Spectral Density (PSD) that occur mainly due to inaccurate noise estimation caused by a Voice Activity Detector (VAD). In this paper analyses show that, given accurate noise estimation, the phase relationship between speech and noise becomes the dominant cause of the negative values. Full-Wave Rectification based SS (FWR-SS) combined with Instantaneous Noise Estimation (INE) is therefore proposed to be applied instead of VAD based HWR-SS as it is better capable of maintaining the speech information in those negative values. It is also shown in the paper that FWR-SS provides optimum orthogonality between the estimated noise and speech signals. The INE method proposed in this paper is Likelihood Controlled Instantaneous Noise Estimation (LCINE), which combines long-term statistical characteristics of noise resulting from a VAD with a method of short-term INE. The combination of FWR-SS and LCINE is computationally efficient and shows a 51% error rate reduction on the Aurora 2 database in comparison to the basic Aurora front-end provided by ETSI .
Bibliographic reference. Xu, Haitian / Tan, Zheng-Hua / Dalsgaard, Paul / Lindberg, Borge (2004): "Spectral subtraction with full-wave rectification and likelihood controlled instantaneous noise estimation for robust speech recognition", In INTERSPEECH-2004, 2085-2088.