7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper we propose a filter bank structure derived by using admissible wavelet packet transform. These filters have Mel scale spacing and have an advantage of easy implementation with higher resolution in time-frequency domain because of wavelet transform. The features are obtained by first calculating the energy in each filter band and then applying the Discrete Cosine Transform (DCT) to the energy vector. We evaluate the recognition performance of the features derived from the Mel-Scaled Wavelet Filter (MSWF) bank structure and compare it with that derived from Mel Frequency Cepstral Coefficients (MFCC). Experimental results on the phoneme recognition from the TIMIT database show that, features derived by using MSWF performs better as compared to MFCC features for unvoiced stops and unvoiced fricatives. Further the noise performance of these features are also found to be better as compared to MFCC features.
Bibliographic reference. Farooq, O. / Datta, S. (2002): "Mel-scaled wavelet filter based features for noisy unvoiced phoneme recognition", In ICSLP-2002, 1017-1020.