Recurrent neural network architectures have been shown to efficiently model long term temporal dependencies between acoustic events. However the training time of recurrent networks is higher than feedforward networks due to the sequential nature of the learning algorithm. In this paper we propose a time delay neural network architecture which models long term temporal dependencies with training times comparable to standard feed-forward DNNs. The network uses sub-sampling to reduce computation during training. On the Switchboard task we show a relative improvement of 7.3% over the baseline DNN model. We present results on several LVCSR tasks with training data ranging from 3 to 1800 hours to show the effectiveness of the TDNN architecture in learning wider temporal dependencies in both small and large data scenarios, with an average relative improvement of 5.5%.
Bibliographic reference. Peddinti, Vijayaditya / Povey, Daniel / Khudanpur, Sanjeev (2015): "A time delay neural network architecture for efficient modeling of long temporal contexts", In INTERSPEECH-2015, 3214-3218.