Large Margin Training for Attention Based End-to-End Speech Recognition

Peidong Wang, Jia Cui, Chao Weng, Dong Yu


End-to-end speech recognition systems are typically evaluated using the maximum a posterior criterion. Since only one hypothesis is involved during evaluation, the ideal number of hypotheses for training should also be one. In this study, we propose a large margin training scheme for attention based end-to-end speech recognition. Using only one training hypothesis, the large margin training strategy achieves the same performance as the minimum word error rate criterion using four hypotheses. The theoretical derivation in this study is widely applicable to other sequence discriminative criteria such as maximum mutual information. In addition, this paper provides a more succinct formulation of the large margin concept, paving the road towards a better combination of support vector machine and deep neural network.


 DOI: 10.21437/Interspeech.2019-1680

Cite as: Wang, P., Cui, J., Weng, C., Yu, D. (2019) Large Margin Training for Attention Based End-to-End Speech Recognition. Proc. Interspeech 2019, 246-250, DOI: 10.21437/Interspeech.2019-1680.


@inproceedings{Wang2019,
  author={Peidong Wang and Jia Cui and Chao Weng and Dong Yu},
  title={{Large Margin Training for Attention Based End-to-End Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={246--250},
  doi={10.21437/Interspeech.2019-1680},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1680}
}