In reverberant environments there are long term interactions between speech and corrupting sources. In this paper a time delay neural network (TDNN) architecture, capable of learning long term temporal relationships and translation invariant representations, is used for reverberation robust acoustic modeling. Further, iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate. By sub-sampling the outputs at TDNN layers across time steps, training time is reduced. Using a parallel training algorithm we show that the TDNN can be trained on ~ 5500 hours of speech data in 3 days using up to 32 GPUs. The TDNN is shown to provide results competitive with state of the art systems in the IARPA ASpIRE challenge, with 27.7% WER on the dev_test set.
Bibliographic reference. Peddinti, Vijayaditya / Chen, Guoguo / Povey, Daniel / Khudanpur, Sanjeev (2015): "Reverberation robust acoustic modeling using i-vectors with time delay neural networks", In INTERSPEECH-2015, 2440-2444.