An Exploration of Dropout with LSTMs

Gaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar, Sanjeev Khudanpur, Yonghong Yan


Long Short-Term Memory networks (LSTMs) are a component of many state-of-the-art DNN-based speech recognition systems. Dropout is a popular method to improve generalization in DNN training. In this paper we describe extensive experiments in which we investigated the best way to combine dropout with LSTMs — specifically, projected LSTMs (LSTMP). We investigated various locations in the LSTM to place the dropout (and various combinations of locations), and a variety of dropout schedules. Our optimized recipe gives consistent improvements in WER across a range of datasets, including Switchboard, TED-LIUM and AMI.


 DOI: 10.21437/Interspeech.2017-129

Cite as: Cheng, G., Peddinti, V., Povey, D., Manohar, V., Khudanpur, S., Yan, Y. (2017) An Exploration of Dropout with LSTMs. Proc. Interspeech 2017, 1586-1590, DOI: 10.21437/Interspeech.2017-129.


@inproceedings{Cheng2017,
  author={Gaofeng Cheng and Vijayaditya Peddinti and Daniel Povey and Vimal Manohar and Sanjeev Khudanpur and Yonghong Yan},
  title={An Exploration of Dropout with LSTMs},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1586--1590},
  doi={10.21437/Interspeech.2017-129},
  url={http://dx.doi.org/10.21437/Interspeech.2017-129}
}