Model-based feature enhancement is an ASR front-end technique to increase the robustness of the recogniser in noisy environments. However, its MMSE-estimates of the clean speech feature vectors are based only on the static components at the current frame. In this paper, we show how the Kalman filter framework can be seen as a natural extension that incorporates both the current and the previous frames in the enhancement process. Because multiple Kalman filters are run in parallel, the global clean speech estimate is given by a weighted linear combination of the individual MMSE-estimates. Also, the unscented transformation is considered to avoid the linearisation of the cepstral domain observation equation. We present experimental results on the Aurora2 database for both the multi-modal Kalman and the unscented Kalman filter feature enhancement.
Cite as: Stouten, V., Hamme, H.V., Wambacq, P. (2005) Kalman and unscented kalman filter feature enhancement for noise robust ASR. Proc. Interspeech 2005, 953-956, doi: 10.21437/Interspeech.2005-227
@inproceedings{stouten05_interspeech, author={Veronique Stouten and Hugo Van Hamme and Patrick Wambacq}, title={{Kalman and unscented kalman filter feature enhancement for noise robust ASR}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={953--956}, doi={10.21437/Interspeech.2005-227} }