INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Parallel Deep Neural Network Training for LVCSR Tasks Using Blue Gene/Q

Tara N. Sainath, I-hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra Chaudhari

IBM T.J. Watson Research Center, USA

While Deep Neural Networks (DNNs) have achieved tremendous success for LVCSR tasks, training these networks is slow. To date, the most common approach to train DNNs is via stochastic gradient descent (SGD), serially on a single GPU machine. Serial training, coupled with the large number of training parameters and speech data set sizes, makes DNN training very slow for LVCSR tasks. While 2nd order, data-parallel methods have also been explored, these methods are not always faster on CPU clusters due to the large communication cost between processors. In this work, we explore using a specialized hardware/software approach, utilizing a Blue Gene/Q (BG/Q) system, which has thousands of processors and excellent inter-processor communication. We explore using the 2nd order Hessian-free (HF) algorithm for DNN training with BG/Q, for both cross-entropy and sequence training of DNNs. Results on three LVCSR tasks indicate that using HF with BG/Q offers up to an 11x speedup, as well as an improved word error rate (WER), compared to SGD on a GPU.

Full Paper

Bibliographic reference.  Sainath, Tara N. / Chung, I-hsin / Ramabhadran, Bhuvana / Picheny, Michael / Gunnels, John / Kingsbury, Brian / Saon, George / Austel, Vernon / Chaudhari, Upendra (2014): "Parallel deep neural network training for LVCSR tasks using blue gene/Q", In INTERSPEECH-2014, 1048-1052.