Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models

Javier Jorge, Adrià Giménez, Javier Iranzo-Sánchez, Jorge Civera, Albert Sanchis, Alfons Juan


Recurrent Neural Networks, in particular Long-Short Term Memory (LSTM) networks, are widely used in Automatic Speech Recognition for language modelling during decoding, usually as a mechanism for rescoring hypothesis. This paper proposes a new architecture to perform real-time one-pass decoding using LSTM language models. To make decoding efficient, the estimation of look-ahead scores was accelerated by precomputing static look-ahead tables. These static tables were precomputed from a pruned n-gram model, reducing drastically the computational cost during decoding. Additionally, the LSTM language model evaluation was efficiently performed using Variance Regularization along with a strategy of lazy evaluation. The proposed one-pass decoder architecture was evaluated on the well-known LibriSpeech and TED-LIUMv3 datasets. Results showed that the proposed algorithm obtains very competitive WERs with ~0.6 RTFs. Finally, our one-pass decoder is compared with a decoupled two-pass decoder.


 DOI: 10.21437/Interspeech.2019-2798

Cite as: Jorge, J., Giménez, A., Iranzo-Sánchez, J., Civera, J., Sanchis, A., Juan, A. (2019) Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models. Proc. Interspeech 2019, 3820-3824, DOI: 10.21437/Interspeech.2019-2798.


@inproceedings{Jorge2019,
  author={Javier Jorge and Adrià Giménez and Javier Iranzo-Sánchez and Jorge Civera and Albert Sanchis and Alfons Juan},
  title={{Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3820--3824},
  doi={10.21437/Interspeech.2019-2798},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2798}
}