7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the occurring counts of the n-gram units but also the low-order history frequencies. This modification makes the smoothing more reasonable for those n-gram units that have homophonic (same in pronunciation) histories. The new method is tested on a Chinese Pinyin-to-character (where Pinyin is the pronunciation string) conversion system and the results show that the improved method can achieve a surprising reduction both in perplexity and Chinese character error rate.
Bibliographic reference. Wu1, Genqing / Zheng, Fang / Wu1, Wenhu / Xu, Mingxing / Jin, Ling (2002): "Improved katz smoothing for language modeling in speech recogniton", In ICSLP-2002, 925-928.