INTERSPEECH 2004 - ICSLP
Making ASR noise robust requires a form of data normalisation to ensure that the distributions of acoustic features in the training and test condition look similar. Usually, it is attempted to compensate for the impact of noise by estimating the noise characteristics from the signal. In this paper we explore a new method that builds on a-priori knowledge stored in clean speech models. Using Mel bank log-energy features, classical clean speech HMMs were replaced by models in which the model components corresponding to low energy are not considered during recognition. Application of the new method to clean matched data showed that recognition performance was equal or better compared to baseline when less than 45 percent of the model components were discarded. In the case of noisy data, the performance gains were marginal for the model component selections studied so far. Analysis of the results suggests that future research should focus on combining the new model-driven approach with data-driven methods.
Bibliographic reference. Cranen, Bert / Veth, Johan de (2004): "Active perception: using a priori knowledge from clean speech models to ignore non-target features", In INTERSPEECH-2004, 2081-2084.