Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Frequency-Domain Auditory Suppression Modelling (FASM) - A WDFT-Based Anthropomorphic Noise-Robust Feature Extraction Algorithm for Speech Recognition

Alexei V. Ivanov (1), Marek Parfieniuk (2), Alexander A. Petrovsky (2)

(1) Speech Technology Center, Russia (2) Bialystok Technical University, Poland

This paper presents a physiologically inspired feature extraction algorithm for employment within the speech recognition engines, which are supposed to remain effective in noisy environments. Essentially, the algorithm simulates a key property of the "active cochlea" models - a signal dependent variable gain over the frequency range. In order to drastically reduce computational complexity of the algorithm in comparison to the original time domain "active cochlea" models, it is implemented in the frequency domain with the help of a warped discrete Fourier transformation (WDFT). The essence of FASM technique is that in the presence of the noise, higher frequency channels get more attenuation if there are "enough" signal components in the lower, less susceptible to the noise influence, part of the spectrum. As it is confirmed by the performed measurements FASM algorithm allows to boost feature invariance to noise while keeping feature informativeness at the acceptable level.

Full Paper

Bibliographic reference.  Ivanov, Alexei V. / Parfieniuk, Marek / Petrovsky, Alexander A. (2005): "Frequency-domain auditory suppression modelling (FASM) - a WDFT-based anthropomorphic noise-robust feature extraction algorithm for speech recognition", In INTERSPEECH-2005, 713-716.