4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
It is commonly acknowledged that the presence of additive and convolutional noise and speech level variations can seriously deteriorate the performance of a speech recognizer. In case an auditory model is used as the acoustic front-end, it turns out that compensation techniques such as spectral subtraction and log-spectral mean subtraction can be outperformed by time-domain techniques operating on the band-pass filtered signals which are supplied to the haircell models. In  we showed that additive noise could be removed effectively by means of center clippers put in front of the haircell models. This technique, which was called linear noise magnitude subtraction (NMS), is further improved in this paper. The nonlinear NMS proposed here outperforms the linear one, especially for low Signal-to-Noise Ratios. To compensate for speech level variations and convolutional noise, we have adopted the same filosophy: remove the effects before the signal is supplied to the haircell models. This is accomplished by introducing normalization gains in front of the haircell models. It is shown that this loudness mean normalization (LMN) technique when used in combination whith NMS offers a highly robust speech representation.
Bibliographic reference. Vereecken, Halewijn / Martens, Jean-Pierre (1996): "Noise suppression and loudness normalization in an auditory model-based acoustic front-end", In ICSLP-1996, 566-569.