To make speech recognisers robust to noise, either the features or the models can be compensated. Feature enhancement is often fast; model compensation is often more accurate, because it predicts the corrupted speech distribution. It is therefore able, for example, to take uncertainty about the clean speech into account. This paper re-analyses the recently-proposed predictive linear transformations for noise compensation as minimising the kl divergence between the predicted corrupted speech and the adapted models. New schemes are then introduced which apply observation-dependent transformations in the front-end to adapt the back-end distributions. One applies transforms in the exact same manner as the popular minimum mean square error (mmse) feature enhancement scheme, and is as fast. The new method performs better on aurora 2.
Bibliographic reference. Dalen, R. C. van / Flego, F. / Gales, M. J. F. (2009): "Transforming features to compensate speech recogniser models for noise", In INTERSPEECH-2009, 2499-2502.