INTERSPEECH 2012

Training neural network acoustic models with sequencediscriminative criteria, such as statelevel minimum Bayes risk (sMBR), been shown to produce large improvements in performance over crossentropy. However, because they entail the processing of lattices, sequence criteria are much more computationally intensive than crossentropy. We describe a distributed neural network training algorithm, based on Hessianfree optimization, that scales to deep networks and large data sets. For the sMBR criterion, this training algorithm is faster than stochastic gradient descent by a factor of 5.5 and yields a 4.4% relative improvement in word error rate on a 50hour broadcast news task. Distributed Hessianfree sMBR training yields relative reductions in word error rate of 713% over crossentropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic. Our best Switchboard DBN achieves a word error rate of 16.4% on rt03FSH.
Index Terms: deep learning, discriminative training, secondorder optimization, distributed computing
Bibliographic reference. Kingsbury, Brian / Sainath, Tara N. / Soltau, Hagen (2012): "Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessianfree optimization", In INTERSPEECH2012, 1013.