15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Lattice Decoding and Rescoring with Long-Span Neural Network Language Models

Martin Sundermeyer, Zoltán Tüske, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

With long-span neural network language models, considerable improvements have been obtained in speech recognition. However, it is difficult to apply these models if the underlying search space is large.
   In this paper, we combine previous work on lattice decoding with long short-term memory (LSTM) neural network language models. By adding refined pruning techniques, we are able to reduce the search effort by a factor of three.
   Furthermore, we introduce two novel approximations for full lattice rescoring, which opens the potential of lattice-based speech recognition techniques. Compared to 1000-best lists, we find that we can increase the word error rate improvements obtained with LSTMs from 8.2% to 10.7% relative over a state-of-the-art baseline, while the resulting lattices are even considerably smaller. In addition, we investigate the use of LSTMs for Babel Assamese keyword search, obtaining significant improvements of 2.5% relative.

Full Paper

Bibliographic reference.  Sundermeyer, Martin / Tüske, Zoltán / Schlüter, Ralf / Ney, Hermann (2014): "Lattice decoding and rescoring with long-Span neural network language models", In INTERSPEECH-2014, 661-665.