Far-Field ASR Without Parallel Data

Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur


In far-field speech recognition systems, training acoustic models with alignments generated from parallel close-talk microphone data provides significant improvements. However it is not practical to assume the availability of large corpora of parallel close-talk microphone data, for training. In this paper we explore methods to reduce the performance gap between far-field ASR systems trained with alignments from distant microphone data and those trained with alignments from parallel close-talk microphone data. These methods include the use of a lattice-free sequence objective function which tolerates minor mis-alignment errors; and the use of data selection techniques to discard badly aligned data. We present results on single distant microphone and multiple distant microphone scenarios of the AMI LVCSR task. We identify prominent causes of alignment errors in AMI data.


DOI: 10.21437/Interspeech.2016-1475

Cite as

Peddinti, V., Manohar, V., Wang, Y., Povey, D., Khudanpur, S. (2016) Far-Field ASR Without Parallel Data. Proc. Interspeech 2016, 1996-2000.

Bibtex
@inproceedings{Peddinti+2016,
author={Vijayaditya Peddinti and Vimal Manohar and Yiming Wang and Daniel Povey and Sanjeev Khudanpur},
title={Far-Field ASR Without Parallel Data},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1475},
url={http://dx.doi.org/10.21437/Interspeech.2016-1475},
pages={1996--2000}
}