ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Sequence-discriminative training of deep neural networks

Karel Veselý, Arnab Ghoshal, Lukáš Burget, Daniel Povey

Sequence-discriminative training of deep neural networks (DNNs) is investigated on a 300 hour American English conversational telephone speech task. Different sequence-discriminative criteria . maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI . are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria . lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 8.9% relative, on average. Little difference is noticed between the different sequence-based criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.


doi: 10.21437/Interspeech.2013-548

Cite as: Veselý, K., Ghoshal, A., Burget, L., Povey, D. (2013) Sequence-discriminative training of deep neural networks. Proc. Interspeech 2013, 2345-2349, doi: 10.21437/Interspeech.2013-548

@inproceedings{vesely13_interspeech,
  author={Karel Veselý and Arnab Ghoshal and Lukáš Burget and Daniel Povey},
  title={{Sequence-discriminative training of deep neural networks}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2345--2349},
  doi={10.21437/Interspeech.2013-548}
}