Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR

Li Wenjie, Gaofeng Cheng, Fengpei Ge, Pengyuan Zhang, Yonghong Yan


The Long Short-Term Memory (LSTM) architecture is a very special kind of recurrent neural network for modeling sequential data like speech. It has been widely used in the large scale acoustic model estimation recently and performs better than many other neural networks. Batch normalization(BN) is a good way to accelerate network training and improve the generalization performance of neural networks. However, applying batch normalization in the LSTM model is more complicated and challenging than in the feed-forward network. In this paper, we explored some novel approaches to add batch normalization to the LSTM model in bidirectional mode. Then we investigated some ways to combine the BN-BLSTM model with dropout, which is a traditional method to alleviate the overfitting problem in neural network training. We evaluated the proposed methods on several speech recognition tasks. Experiments show that the best performance on Switchboard task achieves 9.8% relative reduction on word error rate compared to the baseline, using the total Hub5’2000 evaluation dataset. Additionally, it is easy to implement and brings little extra computation.


 DOI: 10.21437/Interspeech.2018-1597

Cite as: Wenjie, L., Cheng, G., Ge, F., Zhang, P., Yan, Y. (2018) Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR. Proc. Interspeech 2018, 2888-2892, DOI: 10.21437/Interspeech.2018-1597.


@inproceedings{Wenjie2018,
  author={Li Wenjie and Gaofeng Cheng and Fengpei Ge and Pengyuan Zhang and Yonghong Yan},
  title={Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2888--2892},
  doi={10.21437/Interspeech.2018-1597},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1597}
}