16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

A Study on Deep Neural Network Acoustic Model Adaptation for Robust Far-Field Speech Recognition

Seyedmahdad Mirsamadi, John H. L. Hansen

University of Texas at Dallas, USA

Even though deep neural network acoustic models provide an increased degree of robustness in automatic speech recognition, there is still a large performance drop in the task of far-field speech recognition in reverberant and noisy environments. In this study, we explore DNN adaptation techniques to achieve improved robustness to environmental mismatch for far-field speech recognition. In contrast to many recent studies investigating the role of feature processing in DNN-HMM systems, we focus on adaptation of a clean-trained DNN model to speech data captured by a distant-talking microphone in a target environment with substantial reverberation and noise. We show that significant performance gains can be obtained by discriminatively estimating a set of adaptation parameters to compensate the mismatch between a clean-trained model and a small set of noisy and reverberant adaptation data. Using various adaptation strategies, relative word error rate improvements of up to 16% could be obtained on the single-channel task of the recent Aspire challenge.

Full Paper

Bibliographic reference.  Mirsamadi, Seyedmahdad / Hansen, John H. L. (2015): "A study on deep neural network acoustic model adaptation for robust far-field speech recognition", In INTERSPEECH-2015, 2430-2434.