INTERSPEECH 2014

We recently showed that Long ShortTerm Memory (LSTM) recurrent neural networks (RNNs) outperform stateoftheart deep neural networks (DNNs) for large scale acoustic modeling where the models were trained with the crossentropy (CE) criterion. It has also been shown that sequence discriminative training of DNNs initially trained with the CE criterion gives significant improvements. In this paper, we investigate sequence discriminative training of LSTM RNNs in a large scale acoustic modeling task. We train the models in a distributed manner using asynchronous stochastic gradient descent optimization technique. We compare two sequence discriminative criteria — maximum mutual information and statelevel minimum Bayes risk, and we investigate a number of variations of the basic training strategy to better understand issues raised by both the sequential model, and the objective function. We obtain significant gains over the CE trained LSTM RNN model using sequence discriminative training techniques.
Bibliographic reference. Sak, Haşim / Vinyals, Oriol / Heigold, Georg / Senior, Andrew / McDermott, Erik / Monga, Rajat / Mao, Mark (2014): "Sequence discriminative distributed training of long shortterm memory recurrent neural networks", In INTERSPEECH2014, 12091213.