ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium

September 18-20, 2000
Paris, France

Noise Level Normalization And Reference Adaptation For Robust Speech Recognition

Florian Hilger and Hermann Ney

Lehrstuhl für Informatik VI, RWTH Aachen, Germany

This paper describes an approach to normalize the noise level of a speech signal at the outputs of the Mel scaled filter-bank used in MFCC-feature extraction. An adaptive normalizing function that distinguishes between speech and silence parts of the signal is used to normalize the noise level, without altering the speech parts of the signal. This technique is combined with an adaptation of the reference vectors, depending on the average norm of the incoming feature vectors. On a database with training data recorded in office environment and testing data recorded in driving cars, the word error rate could be reduced from 35.5% to 14.7% for the city traffic testing set and from 78.0% to 24.1% for the highway testing set.


Full Paper (PDF)   Full Paper (Zipped Postscript)
Presentation (PDF)

Bibliographic reference.  Hilger, Florian / Ney, Hermann (2000): "Noise level normalization and reference adaptation for robust speech recognition", In ASR-2000, 64-68.