Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition

Animesh Prasad, Khe Chai Sim


Microphone distance adaptation is an important and challenging problem for far field speech recognition using a single distant microphone. This paper investigates the use of Cluster Adaptive Training (CAT) to learn a structured Deep Neural Network (DNN) that can be quickly adapted to cope with changes in the distance between the microphone and speaker at test time. A speech corpus was created by re-recording the Wall Street Journal (WSJ0) audio using far-field microphones with 8 different distances from the source. Experimental results show that unsupervised adaptation of the CAT-DNN model achieved up to 0.9% absolute word error rate reduction compared to the canonical model trained on multi-style data.


DOI: 10.21437/Interspeech.2016-738

Cite as

Prasad, A., Sim, K.C. (2016) Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition. Proc. Interspeech 2016, 3823-3827.

Bibtex
@inproceedings{Prasad+2016,
author={Animesh Prasad and Khe Chai Sim},
title={Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-738},
url={http://dx.doi.org/10.21437/Interspeech.2016-738},
pages={3823--3827}
}