16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

A Time Delay Neural Network Architecture for Efficient Modeling of Long Temporal Contexts

Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur

Johns Hopkins University, USA

Recurrent neural network architectures have been shown to efficiently model long term temporal dependencies between acoustic events. However the training time of recurrent networks is higher than feedforward networks due to the sequential nature of the learning algorithm. In this paper we propose a time delay neural network architecture which models long term temporal dependencies with training times comparable to standard feed-forward DNNs. The network uses sub-sampling to reduce computation during training. On the Switchboard task we show a relative improvement of 7.3% over the baseline DNN model. We present results on several LVCSR tasks with training data ranging from 3 to 1800 hours to show the effectiveness of the TDNN architecture in learning wider temporal dependencies in both small and large data scenarios, with an average relative improvement of 5.5%.

Full Paper

Bibliographic reference.  Peddinti, Vijayaditya / Povey, Daniel / Khudanpur, Sanjeev (2015): "A time delay neural network architecture for efficient modeling of long temporal contexts", In INTERSPEECH-2015, 3214-3218.