5th International Conference on Spoken Language Processing
The goal of multi-span language modeling is to integrate the various constraints, both local and global, that are present in the language. In this paper, local constraints are captured via the usual n-gram approach, while global constraints are taken into account through the use of latent semantic analysis. An integrative formulation is derived for the combination of these two paradigms, resulting in an entirely data-driven, multi-span framework for large vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the performance of the integrated language model compares favorably with the corresponding n-gram performance. On a subset of the Wall Street Journal speaker-independent, 20,000-word vocabulary, continuous speech task, we observed a reduction in perplexity of about 25%, and a reduction in average error rate of about 15%.
Bibliographic reference. Bellegarda, Jerome R. (1998): "Multi-Span statistical language modeling for large vocabulary speech recognition", In ICSLP-1998, paper 0134.