GPU-Based WFST Decoding with Extra Large Language Model

Daisuke Fukunaga, Yoshiki Tanaka, Yuichi Kageyama

Weighted finite-state transducer (WFST) decoding in speech recognition can be accelerated by using graphics processing units (GPUs). To obtain a high recognition accuracy in a WFST-based speech recognition system, a very large language model (LM) represented as a WFST with more than 10 GB of data is required. Since a GPU typically has only several GB of memory, it is impossible to store such a large LM in GPU memory. In this paper, we propose a new method for WFST decoding on a GPU. The method utilizes the on-the-fly rescoring algorithm, which performs the Viterbi search on a WFST with a small LM and rescores hypotheses using a large LM during decoding. We solve the problem of insufficient GPU memory by storing most of the large LM in a memory on the host and copying the data from the host memory to the GPU memory as needed during runtime. Our evaluation of the proposed method on the LibriSpeech test sets using an NVIDIA Tesla V100 GPU shows that it achieves a ten times faster decoding than an equivalent CPU implementation without recognition accuracy degradation.

 DOI: 10.21437/Interspeech.2019-2101

Cite as: Fukunaga, D., Tanaka, Y., Kageyama, Y. (2019) GPU-Based WFST Decoding with Extra Large Language Model. Proc. Interspeech 2019, 3815-3819, DOI: 10.21437/Interspeech.2019-2101.

  author={Daisuke Fukunaga and Yoshiki Tanaka and Yuichi Kageyama},
  title={{GPU-Based WFST Decoding with Extra Large Language Model}},
  booktitle={Proc. Interspeech 2019},