The random forest language model (RFLM) has shown encouraging results in several automatic speech recognition (ASR) tasks but has been hindered by practical limitations, notably the space-complexity of RFLM estimation from large amounts of data. This paper addresses large-scale training and testing of the RFLM via an efficient disk-swapping strategy that exploits the recursive structure of a binary decision tree and the local access property of the tree-growing algorithm, redeeming the full potential of the RFLM, and opening avenues of further research, including useful comparisons with n-gram models. Benefits of this strategy are demonstrated by perplexity reduction and lattice rescoring experiments using a state-of-the-art ASR system.
Bibliographic reference. Su, Yi / Jelinek, Frederick / Khudanpur, Sanjeev (2007): "Large-scale random forest language models for speech recognition", In INTERSPEECH-2007, 598-601.