5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Compression Algorithm Of Trigram Language Models Based On Maximum Likelihood Estimation

Norimichi Yodo, Kiyohiro Shikano, Satoshi Nakamura

Graduate School of Information Science, Nara Institute of Science and Technology, Japan

In this paper we propose an algorithm for reducing the size of back-off N-gram models, with less affecting its performance than the traditional cutoff method. The algorithm is based on the Maximum Likelihood (ML) estimation and realizes an N-gram language model with a given number of N-gram probability parameters that minimize the training set perplexity. To confirm the effectiveness of our algorithm, we apply it to trigram and bigram models, and the experiments in terms of perplexity and word error rate in a dictation system are carried out.

Full Paper

Bibliographic reference.  Yodo, Norimichi / Shikano, Kiyohiro / Nakamura, Satoshi (1998): "Compression algorithm of trigram language models based on maximum likelihood estimation", In ICSLP-1998, paper 0716.