ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent

Yangyang Shi, Mei-Yuh Hwang, Kaisheng Yao, Martha Larson

Recurrent neural network based language models (RNNLM) have been demonstrated to outperform traditional n-gram language models in automatic speech recognition. However, the superior performance is obtained at the cost of expensive model training. In this paper, we propose a sentence-independent subsampling stochastic gradient descent algorithm (SIS-SGD) to speed up the training of RNNLM using parallel processing techniques under the sentence independent condition. The approach maps the process of training the overall model into stochastic gradient descent training of submodels. The update directions of the submodels are aggregated and used as the weight update for the whole model. In the experiments, synchronous and asynchronous SIS-SGD are implemented and compared. Using a multi-thread technique, the synchronous SIS-SGD can achieve a 3-fold speed up without losing performance in terms of word error rate (WER). When multi-processors are used, a nearly 11-fold speed up can be attained with a relative WER increase of only 3%.


doi: 10.21437/Interspeech.2013-327

Cite as: Shi, Y., Hwang, M.-Y., Yao, K., Larson, M. (2013) Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent. Proc. Interspeech 2013, 1203-1207, doi: 10.21437/Interspeech.2013-327

@inproceedings{shi13b_interspeech,
  author={Yangyang Shi and Mei-Yuh Hwang and Kaisheng Yao and Martha Larson},
  title={{Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1203--1207},
  doi={10.21437/Interspeech.2013-327}
}