Lattice Generation in Attention-Based Speech Recognition Models

Michał Zapotoczny, Piotr Pietrzak, Adrian Łańcucki, Jan Chorowski


Attention-based neural speech recognition models are frequently decoded with beam search, which produces a tree of hypotheses. In many cases, such as when using external language models, numerous decoding hypotheses need to be considered, requiring large beam sizes during decoding. We demonstrate that it is possible to merge certain nodes in a tree of hypotheses, in order to obtain a decoding lattice, which increases the number of decoding hypotheses without increasing the number of candidates that are scored by the neural network. We propose a convolutional architecture, which facilitates comparing states of the model at different pi The experiments are carried on the Wall Street Journal dataset, where the lattice decoder obtains lower word error rates with smaller beam sizes, than an otherwise similar architecture with regular beam search.


 DOI: 10.21437/Interspeech.2019-2667

Cite as: Zapotoczny, M., Pietrzak, P., Łańcucki, A., Chorowski, J. (2019) Lattice Generation in Attention-Based Speech Recognition Models. Proc. Interspeech 2019, 2225-2229, DOI: 10.21437/Interspeech.2019-2667.


@inproceedings{Zapotoczny2019,
  author={Michał Zapotoczny and Piotr Pietrzak and Adrian Łańcucki and Jan Chorowski},
  title={{Lattice Generation in Attention-Based Speech Recognition Models}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2225--2229},
  doi={10.21437/Interspeech.2019-2667},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2667}
}