Interspeech'2005 - Eurospeech
Conventional speech recognition engines extract Mel Frequency Cepstral Coefficients (MFCC) features from incoming speech. This paper presents a novel approach for feature extraction in which speech is processed according to the Auditory Image Model, a model of human psychoacoustics. We fist describe the proposed front-end, then we present recognition results obtained with the TIMIT database. Comparing with previously published results on the same task, the new approach achieves a 10% improvement in recognition accuracy.
Bibliographic reference. Munich, Mario E. / Lin, Qiguang (2005): "Auditory image model features for automatic speech recognition", In INTERSPEECH-2005, 3037-3040.