Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Back-Off Smoothing in a Syntactic Approach to Language Modelling

G. Bordel (1), I. Torrest (1), Enrique Vidal (2)

(1) Universidad del Pais Vasco, Spain
(2) Universidad Politecnica de Valencia, Spain

Statistical Language Modelling (N-grams) have been extensively used in Continuous Speech Recognition. However in practice only low values of N are used and as a consequence only very local constraints are represented. It has been recently shown that k-testable Stochastic Languages (k-TS) are strictly equivalent to N-grams so that choosing K-TS or N-grams could be just a matter of representation. A grammatical formalism of the N-gram Language Modelling is presented in this work as it present several advantages. Under the proposed syntactical approach new smoothing techniques can be considered. Alternative ways to obtain accurate probability distributions to be assigned to unseen N-grams are then proposed and the new redistribution formulae are established. Moreover, this new perspective suggests that a good inference ability does not need the symmetry principle to be globally applied but only locally. These proposals has been experimentally compared to the classical back-off method over a task-oriented spontaneous Spanish Speech corpus. A decrease in test-set perplexity of up to 6.5% was achieved when the new proposed approaches were used.

Full Paper

Bibliographic reference.  Bordel, G. / Torrest, I. / Vidal, Enrique (1994): "Back-off smoothing in a syntactic approach to language modelling", In ICSLP-1994, 851-854.