Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models

Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu


Lattice-free maximum mutual information (LF-MMI) training, which enables MMI-based acoustic model training without any lattice generation procedure, has recently been proposed. Although LF-MMI showed high accuracy in many tasks, its MMI criterion does not necessarily maximize the speech recognition accuracy. In this work, we propose a lattice-free state-level minimum Bayes risk training (LF-sMBR), which maximizes state-level expected accuracy without relying on a lattice generation procedure. As is the case with the LF-MMI, LF-sMBR avoids redundant lattice generation by exploiting forward-backward calculation on phone N-gram space, which enables a much simpler and faster training based on an sMBR criterion. We found that special care for silence phones was essential for improving the accuracy by LF-sMBR. In our experiments on the AMI, CSJ and Librispeech corpora, LF-sMBR achieved small but consistent improvements over LF-MMI AMs, showing state-of-the-art results for each test set.


 DOI: 10.21437/Interspeech.2018-79

Cite as: Kanda, N., Fujita, Y., Nagamatsu, K. (2018) Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models. Proc. Interspeech 2018, 2923-2927, DOI: 10.21437/Interspeech.2018-79.


@inproceedings{Kanda2018,
  author={Naoyuki Kanda and Yusuke Fujita and Kenji Nagamatsu},
  title={Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2923--2927},
  doi={10.21437/Interspeech.2018-79},
  url={http://dx.doi.org/10.21437/Interspeech.2018-79}
}