End-to-end Speech Recognition Using Lattice-free MMI

Hossein Hadian, Hossein Sameti, Daniel Povey, Sanjeev Khudanpur


We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees. We use full biphones to enable context-dependent modeling without trees and show that our end-to-end LF-MMI approach can achieve comparable results to regular LF-MMI on well-known large vocabulary tasks. We also compare with other end-to-end methods such as CTC in character-based and lexicon-free settings and show 5 to 25 percent relative reduction in word error rates on different large vocabulary tasks while using significantly smaller models.


 DOI: 10.21437/Interspeech.2018-1423

Cite as: Hadian, H., Sameti, H., Povey, D., Khudanpur, S. (2018) End-to-end Speech Recognition Using Lattice-free MMI. Proc. Interspeech 2018, 12-16, DOI: 10.21437/Interspeech.2018-1423.


@inproceedings{Hadian2018,
  author={Hossein Hadian and Hossein Sameti and Daniel Povey and Sanjeev Khudanpur},
  title={End-to-end Speech Recognition Using Lattice-free MMI},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={12--16},
  doi={10.21437/Interspeech.2018-1423},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1423}
}