8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Large-Scale Random Forest Language Models for Speech Recognition

Yi Su, Frederick Jelinek, Sanjeev Khudanpur

Johns Hopkins University, USA

The random forest language model (RFLM) has shown encouraging results in several automatic speech recognition (ASR) tasks but has been hindered by practical limitations, notably the space-complexity of RFLM estimation from large amounts of data. This paper addresses large-scale training and testing of the RFLM via an efficient disk-swapping strategy that exploits the recursive structure of a binary decision tree and the local access property of the tree-growing algorithm, redeeming the full potential of the RFLM, and opening avenues of further research, including useful comparisons with n-gram models. Benefits of this strategy are demonstrated by perplexity reduction and lattice rescoring experiments using a state-of-the-art ASR system.

Full Paper

Bibliographic reference.  Su, Yi / Jelinek, Frederick / Khudanpur, Sanjeev (2007): "Large-scale random forest language models for speech recognition", In INTERSPEECH-2007, 598-601.