7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Improved Katz Smoothing for Language Modeling in Speech Recogniton

Genqing Wu1, Fang Zheng (2), Wenhu Wu1, Mingxing Xu (1), Ling Jin (1)

(1) Tsinghua University, China; (2) Beijing d-Ear Technologies Co. Ltd., China

In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the occurring counts of the n-gram units but also the low-order history frequencies. This modification makes the smoothing more reasonable for those n-gram units that have homophonic (same in pronunciation) histories. The new method is tested on a Chinese Pinyin-to-character (where Pinyin is the pronunciation string) conversion system and the results show that the improved method can achieve a surprising reduction both in perplexity and Chinese character error rate.

Full Paper

Bibliographic reference.  Wu1, Genqing / Zheng, Fang / Wu1, Wenhu / Xu, Mingxing / Jin, Ling (2002): "Improved katz smoothing for language modeling in speech recogniton", In ICSLP-2002, 925-928.