Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Impact of Bucketing on Performance of Linearly Interpolated Language Models

K. Visweswariah, H. Printz, M. Picheny

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

N-gram models are used to model language in various applications. For large vocabularies, even a very large corpus is insucient to estimate a raw ratio-of-counts trigram model. One common way to overcome this problem is by linear interpolation of the trigram model with lower order models. The interpolation weights can be varied as a function of the current history, to reflect the confidence we have in the estimates of various orders. Since the number of histories is large we cannot hope to estimate a set of weights for each history. Thus sets of histories are tied together and the same weights are used for all histories within the set. In this paper we study the e ect of the algorithm used to tie together the various histories. We report word error rate (WER) results on a large-vocabulary speech recognition task.

Full Paper

Bibliographic reference.  Visweswariah, K. / Printz, H. / Picheny, M. (2000): "Impact of bucketing on performance of linearly interpolated language models", In ICSLP-2000, vol.1, 178-181.