Sequence-discriminative training of deep neural networks (DNNs) is investigated on a 300 hour American English conversational telephone speech task. Different sequence-discriminative criteria . maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI . are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria . lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 8.9% relative, on average. Little difference is noticed between the different sequence-based criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.
Bibliographic reference. Veselý, Karel / Ghoshal, Arnab / Burget, Lukáš / Povey, Daniel (2013): "Sequence-discriminative training of deep neural networks", In INTERSPEECH-2013, 2345-2349.