EUROSPEECH 2003 - INTERSPEECH 2003
In this paper a noise robust feature extraction algorithm using joint wavelet packet decomposition (WPD) and an autoregressive (AR) modeling of the speech signal is presented. In opposition to the short time Fourier transform (STFT) based time-frequency signal representation, a computationally efficient WPD can lead to better representation of non-stationary parts of the speech signal (consonants). The vowels are well described with an AR model like in LPC analysis. The separately extracted WPD and AR based features are combined together with the usage of modified principal component analysis (PCA) and voiced/unvoiced decision to produce final output feature vector. The noise robustness is improved with the application of the proposed wavelet based denoising algorithm with the modified soft thresholding procedure and the voice activity detection. Speech recognition results on Aurora 3 databases show performance improvement of 47.6% relative to the standard MFCC front-end.
Bibliographic reference. Kotnik, Bojan / Kacic, Zdravko / Horvat, Bogomir (2003): "Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling", In EUROSPEECH-2003, 1793-1796.