In the light of the improvements that were made in the last years with neural network-based acoustic models, it is an interesting question whether these models are also suited for noise-robust recognition. This has not yet been fully explored, although first experiments confirm this question. Furthermore, preprocessing techniques that improve the robustness should be re-evaluated with these new models. In this work, we present experimental results to address these questions. Acoustic models based on Gaussian mixture models (GMMs), deep neural networks (DNNs), and long short-term memory (LSTM) recurrent neural networks (which have an improved ability to exploit context) are evaluated for their robustness after clean or multi-condition training. In addition, the influence of non-negative matrix factorization (NMF) for speech enhancement is investigated. Experiments are performed with the Aurora-4 database and the results show that DNNs perform slightly better than LSTMs and, as expected, both beat GMMs. Furthermore, speech enhancement is capable of improving the DNN result.
Bibliographic reference. Geiger, Jürgen T. / Gemmeke, Jort F. / Schuller, Björn / Rigoll, Gerhard (2014): "Investigating NMF speech enhancement for neural network based acoustic models", In INTERSPEECH-2014, 2405-2409.