14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Speed Up of Recurrent Neural Network Language Models with Sentence Independent Subsampling Stochastic Gradient Descent

Yangyang Shi (1), Mei-Yuh Hwang (1), Kaisheng Yao (1), Martha Larson (2)

(1) Microsoft, China
(2) Technische Universiteit Delft, The Netherlands

Recurrent neural network based language models (RNNLM) have been demonstrated to outperform traditional n-gram language models in automatic speech recognition. However, the superior performance is obtained at the cost of expensive model training. In this paper, we propose a sentence-independent subsampling stochastic gradient descent algorithm (SIS-SGD) to speed up the training of RNNLM using parallel processing techniques under the sentence independent condition. The approach maps the process of training the overall model into stochastic gradient descent training of submodels. The update directions of the submodels are aggregated and used as the weight update for the whole model. In the experiments, synchronous and asynchronous SIS-SGD are implemented and compared. Using a multi-thread technique, the synchronous SIS-SGD can achieve a 3-fold speed up without losing performance in terms of word error rate (WER). When multi-processors are used, a nearly 11-fold speed up can be attained with a relative WER increase of only 3%.

Full Paper

Bibliographic reference.  Shi, Yangyang / Hwang, Mei-Yuh / Yao, Kaisheng / Larson, Martha (2013): "Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent", In INTERSPEECH-2013, 1203-1207.