Even though deep neural network acoustic models provide an increased degree of robustness in automatic speech recognition, there is still a large performance drop in the task of far-field speech recognition in reverberant and noisy environments. In this study, we explore DNN adaptation techniques to achieve improved robustness to environmental mismatch for far-field speech recognition. In contrast to many recent studies investigating the role of feature processing in DNN-HMM systems, we focus on adaptation of a clean-trained DNN model to speech data captured by a distant-talking microphone in a target environment with substantial reverberation and noise. We show that significant performance gains can be obtained by discriminatively estimating a set of adaptation parameters to compensate the mismatch between a clean-trained model and a small set of noisy and reverberant adaptation data. Using various adaptation strategies, relative word error rate improvements of up to 16% could be obtained on the single-channel task of the recent Aspire challenge.
Bibliographic reference. Mirsamadi, Seyedmahdad / Hansen, John H. L. (2015): "A study on deep neural network acoustic model adaptation for robust far-field speech recognition", In INTERSPEECH-2015, 2430-2434.