Over the years, the focus in noise robust speech recognition has shifted from noise robust features to model based techniques such as parallel model combination and uncertainty decoding. In this paper, we contrast prime examples of both approaches in the context of large vocabulary recognition systems such as used for automatic audio indexing and transcription. We look at the approximations the techniques require to keep the computational load reasonable, the resulting computational cost, and the accuracy measured on the Aurora4 benchmark. The results show that a well designed feature based scheme is capable of providing recognition accuracies at least as good as the model based approaches at a substantially lower computational cost.
Bibliographic reference. Demuynck, Kris / Zhang, Xueru / Compernolle, Dirk Van / Van hamme, Hugo (2010): "Feature versus model based noise robustness", In INTERSPEECH-2010, 721-724.