13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization

Brian Kingsbury, Tara N. Sainath, Hagen Soltau

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

Training neural network acoustic models with sequence-discriminative criteria, such as state-level minimum Bayes risk (sMBR), been shown to produce large improvements in performance over cross-entropy. However, because they entail the processing of lattices, sequence criteria are much more computationally intensive than cross-entropy. We describe a distributed neural network training algorithm, based on Hessian-free optimization, that scales to deep networks and large data sets. For the sMBR criterion, this training algorithm is faster than stochastic gradient descent by a factor of 5.5 and yields a 4.4% relative improvement in word error rate on a 50-hour broadcast news task. Distributed Hessianfree sMBR training yields relative reductions in word error rate of 7-13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic. Our best Switchboard DBN achieves a word error rate of 16.4% on rt03-FSH.

Index Terms: deep learning, discriminative training, secondorder optimization, distributed computing

Full Paper

Bibliographic reference.  Kingsbury, Brian / Sainath, Tara N. / Soltau, Hagen (2012): "Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization", In INTERSPEECH-2012, 10-13.