![]() |
ASR2000 - Automatic Speech Recognition: Challenges for the new MilleniumSeptember 18-20, 2000 |
![]() |
This paper describes an approach to normalize the noise level of a speech signal at the outputs of the Mel scaled filter-bank used in MFCC-feature extraction. An adaptive normalizing function that distinguishes between speech and silence parts of the signal is used to normalize the noise level, without altering the speech parts of the signal. This technique is combined with an adaptation of the reference vectors, depending on the average norm of the incoming feature vectors. On a database with training data recorded in office environment and testing data recorded in driving cars, the word error rate could be reduced from 35.5% to 14.7% for the city traffic testing set and from 78.0% to 24.1% for the highway testing set.
Full Paper (PDF)
Full Paper (Zipped Postscript)
Presentation (PDF)
Bibliographic reference. Hilger, Florian / Ney, Hermann (2000): "Noise level normalization and reference adaptation for robust speech recognition", In ASR-2000, 64-68.