The usage of Recurrent Neural Network Language Models (RNNLMs) has allowed reaching significant improvements in Automatic Speech Recognition (ASR) tasks. However, to take advantage of their capability for considering long histories, they are usually used to rescore the N-best lists (i.e. it is in practice not possible to use them directly during acoustic trellis search). We propose in this paper a novel method for rescoring directly the hypotheses contained in the word graphs, which are generated in the first pass of ASR decoding. The method, based on the A* stack search, rescores the partial theories of the stack with a log-linear combination of the acoustic model score and a linear combination of multiple language model scores (including RNNLM). We compared, on an ASR task consisting of the automatic transcription of English weather news, the A* based approach with N-best rescoring and iterative confusion network decoding. Using the proposed method, we measured a relative word error rate improvement of about 6%, on the given task, with respect to using the baseline system. The latter improvement is comparable with that obtained with N-best list based rescoring method.
Bibliographic reference. Jalalvand, Shahab / Falavigna, Daniele (2014): "Direct word graph rescoring using a* search and RNNLM", In INTERSPEECH-2014, 2630-2634.