Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Auditory Teager Energy Cepstrum Coefficients for Robust Speech Recognition

Dimitrios Dimitriadis (1), Petros Maragos (1), Alexandros Potamianos (2)

(1) National Technical University of Athens, Greece; (2) Technical University of Crete, Greece

In this paper, a feature extraction algorithm for robust speech recognition is introduced. The feature extraction algorithm is motivated by the human auditory processing and the nonlinear Teager-Kaiser energy operator that estimates the true energy of the source of a resonance. The proposed features are labeled as Teager Energy Cepstrum Coefficients (TECCs). TECCs are computed by first filtering the speech signal through a dense non constant-Q Gammatone filterbank and then by estimating the "true" energy of the signal's source, i.e., the short-time average of the output of the Teager-Kaiser energy operator. Error analysis and speech recognition experiments show that the TECCs and the mel frequency cepstrum coefficients (MFCCs) perform similarly for clean recording conditions; while the TECCs perform significantly better than the MFCCs for noisy recognition tasks. Specifically, relative word error rate improvement of 60% over the MFCC baseline is shown for the Aurora-3 database for the high-mismatch condition. Absolute error rate improvement ranging from 5% to 20% is shown for a phone recognition task in (various types of additive) noise.

Full Paper

Bibliographic reference.  Dimitriadis, Dimitrios / Maragos, Petros / Potamianos, Alexandros (2005): "Auditory Teager energy cepstrum coefficients for robust speech recognition", In INTERSPEECH-2005, 3013-3016.