EUROSPEECH 2003 - INTERSPEECH 2003
It is well-known that additive and channel noise cause shift and scaling in MFCC features. Empirical normalization techniques to estimate and compensate for the effects, such as cepstral mean subtraction and variance normalization, have been shown to be useful. However, these empirical estimate may not be optimal. In this paper, we approach the problem from two directions, 1) use a more robust MFCC-based features that is less sensitive to additive and channel noise and 2) propose a maximum likelihood (ML) based approach to compensate the noise effect. In addition, we proposed the use of multi-class normalization in which different normalization factors can be applied to different phonetic units. The combination of the robust features and ML normalization is particularly useful for highly mismatched condition in the Aurora 3 corpus resulting in a 15.8% relative improvement in the highly mismatched case and a 10.4% relative improvement on average over the three conditions.
Bibliographic reference. Lai, Yiu-Pong / Siu, Man-Hung (2003): "Maximum likelihood normalization for robust speech recognition", In EUROSPEECH-2003, 13-16.