Interspeech'2005 - Eurospeech
This paper presents a physiologically inspired feature extraction algorithm for employment within the speech recognition engines, which are supposed to remain effective in noisy environments. Essentially, the algorithm simulates a key property of the "active cochlea" models - a signal dependent variable gain over the frequency range. In order to drastically reduce computational complexity of the algorithm in comparison to the original time domain "active cochlea" models, it is implemented in the frequency domain with the help of a warped discrete Fourier transformation (WDFT). The essence of FASM technique is that in the presence of the noise, higher frequency channels get more attenuation if there are "enough" signal components in the lower, less susceptible to the noise influence, part of the spectrum. As it is confirmed by the performed measurements FASM algorithm allows to boost feature invariance to noise while keeping feature informativeness at the acceptable level.
Bibliographic reference. Ivanov, Alexei V. / Parfieniuk, Marek / Petrovsky, Alexander A. (2005): "Frequency-domain auditory suppression modelling (FASM) - a WDFT-based anthropomorphic noise-robust feature extraction algorithm for speech recognition", In INTERSPEECH-2005, 713-716.