8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

An Energy Normalization Scheme for Improved Robustness in Speech Recognition

Mohammad Ahadi (1), Hamid Sheikhzadeh (2), Robert Brennan (2), George Freeman (3)

(1) Amirkabir University, Iran
(2) Dspfactory Ltd., Canada
(3) University of Waterloo, Canada

The log energy parameter has long been used as an extension to the basic cepstral feature vector in speech recognition. The use of a normalization technique for the log energy parameter has also been widely accepted. In this paper, a simple energy normalization scheme is introduced that allows direct use of the frame energy parameter in speech recognition and performs well in the presence of noise. Its combination with traditional cepstral mean and variance normalizations has led to error rate improvements of up to 55% on the Aurora 2 task, in comparison to the baseline clean-trained system using feature set including the log energy parameter. This achievement has been obtained with neither complicated programming nor computation expensive routines. The performance of this scheme on an utterance-wide basis has been close to that of the off-line speaker-wide normalization, which makes it a good candidate for practical systems.

Full Paper

Bibliographic reference.  Ahadi, Mohammad / Sheikhzadeh, Hamid / Brennan, Robert / Freeman, George (2004): "An energy normalization scheme for improved robustness in speech recognition", In INTERSPEECH-2004, 1649-1652.