Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Context Scope Selection in Multi-Span Statistical Language Modeling

Jerome R. Bellegarda

Spoken Language Group, Apple Computer, Cupertino, CA, USA

A multi-span framework was recently proposed to integrate the various constraints, both local and global, that are present in the language. In this approach, local constraints are captured via n-gram language modeling, while global constraints are taken into account through the use of latent semantic analysis. The complementarity between these two paradigms translates into improved modeling performance, as measured by both perplexity and word error rate reduction. This performance improvement is sensitive to the context scope, i.e., the e ective length of the document history used in latent semantic analysis during recognition. Context scope selection via exponential forgetting is proposed to discount older utterances as necessary. Experiments on a subset of the Wall Street Journal task led to a reduction in average word error rate of up to 22.5%.

Full Paper (PDF)

Bibliographic reference.  Bellegarda, Jerome R. (1999): "Context scope selection in multi-Span statistical language modeling", In EUROSPEECH'99, 2163-2166.