Output-Gate Projected Gated Recurrent Unit for Speech Recognition

Gaofeng Cheng, Daniel Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur, Yonghong Yan


In this paper, we describe the work on accelerating decoding speed while improving the decoding accuracy. Firstly, we propose an architecture which we call Projected Gated Recurrent Unit (PGRU) for automatic speech recognition (ASR) tasks and show that the PGRU could outperform the standard GRU consistently. Secondly, in order to improve the PGRU's generalization, especially for large-scale ASR task, the Output-gate PGRU (OPGRU) is proposed. Finally, time delay neural network (TDNN) and normalization skills are found to be beneficial to the proposed projected-based GRU. The finally proposed unidirectional TDNN-OPGRU acoustic model achieves 3.3% / 4.5% relative reduction in word error rate (WER) compared with bidirectional projected LSTM (BLSTMP) on Eval2000 / RT03 test sets. Meanwhile, TDNN-OPGRU acoustic model speeds up the decoding speed by around 2.6 times compared with BLSTMP.


 DOI: 10.21437/Interspeech.2018-1403

Cite as: Cheng, G., Povey, D., Huang, L., Xu, J., Khudanpur, S., Yan, Y. (2018) Output-Gate Projected Gated Recurrent Unit for Speech Recognition. Proc. Interspeech 2018, 1793-1797, DOI: 10.21437/Interspeech.2018-1403.


@inproceedings{Cheng2018,
  author={Gaofeng Cheng and Daniel Povey and Lu Huang and Ji Xu and Sanjeev Khudanpur and Yonghong Yan},
  title={Output-Gate Projected Gated Recurrent Unit for Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1793--1797},
  doi={10.21437/Interspeech.2018-1403},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1403}
}