Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Combination of Different N-Grams Based on Their Different Assumptions

Gongjun Li, Na Dong, Toshiro Ishikawa

R&D Center, Matsushita Electric (China) Co., Ltd., China

This paper addresse the negative impact of assumptions artificially introduced from different ngram on its performance in natural language processing. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gram language model together by introducing probabilities of assumption. The assumption probabilities are estimated on the basis of discriminative estimation criterion. We evaluate the improved n-gram on the platform of conversion from Chinese pinyin to Chinese character. The experimental results show that the error rate could be remarkably reduced by at most 55.2%. Besides, the improved language model can solve the data sparsity problem.


Full Paper

Bibliographic reference.  Li, Gongjun / Dong, Na / Ishikawa, Toshiro (2000): "Combination of different n-grams based on their different assumptions", In ICSLP-2000, vol.3, 358-361.