Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling

Dung T. Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani


Recurrent neural networks (RNNs) with jump ahead connections have been used in the computer vision tasks. Still, they have not been investigated well for automatic speech recognition (ASR) tasks. In other words, unfolded RNN has been shown to be an effective model for acoustic modeling tasks. This paper investigates how to elaborate a sophisticated unfolded deep RNN architecture in which recurrent connections use a convolutional neural network (CNN) to model a short-term dependence between hidden states. In this study, our unfolded RNN architecture is a CNN that process a sequence of input features sequentially. Each time step, the CNN inputs a small block of the input features and the output of the hidden layer from the preceding block in order to compute the output of its hidden layer. In addition, by exploiting either one or multiple jump ahead connections between time steps, our network can learn long-term dependencies more effectively. We carried experiments on the CHiME 3 task showing the effectiveness of our proposed approach.


 DOI: 10.21437/Interspeech.2017-873

Cite as: Tran, D.T., Delcroix, M., Karita, S., Hentschel, M., Ogawa, A., Nakatani, T. (2017) Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling. Proc. Interspeech 2017, 1596-1600, DOI: 10.21437/Interspeech.2017-873.


@inproceedings{Tran2017,
  author={Dung T. Tran and Marc Delcroix and Shigeki Karita and Michael Hentschel and Atsunori Ogawa and Tomohiro Nakatani},
  title={Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1596--1600},
  doi={10.21437/Interspeech.2017-873},
  url={http://dx.doi.org/10.21437/Interspeech.2017-873}
}