Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Auditory Image Model Features for Automatic Speech Recognition

Mario E. Munich (1), Qiguang Lin (2)

(1) Evolution Robotics, USA; (2) AOL Voice Services, USA

Conventional speech recognition engines extract Mel Frequency Cepstral Coefficients (MFCC) features from incoming speech. This paper presents a novel approach for feature extraction in which speech is processed according to the Auditory Image Model, a model of human psychoacoustics. We fist describe the proposed front-end, then we present recognition results obtained with the TIMIT database. Comparing with previously published results on the same task, the new approach achieves a 10% improvement in recognition accuracy.

Full Paper

Bibliographic reference.  Munich, Mario E. / Lin, Qiguang (2005): "Auditory image model features for automatic speech recognition", In INTERSPEECH-2005, 3037-3040.