International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

A Compression Method Used in Language Modeling for Handheld Devices

Genqing Wu, Fang Zheng, Wenhu Wu

Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, China

In this paper, a new n-gram language model compression method is proposed for applications in handheld devices, such as mobiles, PDAs, and handheld PCs. Compared with the traditional methods, the use of the proposed method can compress the model to a great extent with good performance preserved. The proposed method includes three aspects. The language model parameters are detailedly analyzed and a criterion based on the probability and the importance of n-grams is used to determine which n-grams should be kept and which be removed. A curving compressing function is proposed to be used to compress the ngram count values in the full language model. And a code table is extracted and used to estimate the probabilities of bi-grams. Our experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease. This makes the language model usable in handheld devices.

Full Paper

Bibliographic reference.  Wu, Genqing / Zheng, Fang / Wu, Wenhu (2002): "A compression method used in language modeling for handheld devices", In ISCSLP 2002, paper 117.