Sixth European Conference on Speech Communication and Technology
Development of robust and efficient front-end is crucial for robust ASR. Proper time and frequency resolution of the TFR of speech, motivated by the auditory models is considered an important factor for robustness. An efficient method of realizing a variable resolution TFR using DTFT/Goertzel algorithm is proposed instead of the standard FFT based approach. It is shown that the new representation, called EarLyzer, is more robust than the FFT based Mel frequency cepstral coefficient representation for an automobile noisy speech recognition task.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Avadhanulu, J. V. / Mathew, M. / Sreenivas, T. V. (1999): "EARLYZER: perceptualy motivated robust TFR of speech", In EUROSPEECH'99, 2765-2768.