Sixth International Conference on Spoken Language Processing
This paper addresse the negative impact of assumptions artificially introduced from different ngram on its performance in natural language processing. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gram language model together by introducing probabilities of assumption. The assumption probabilities are estimated on the basis of discriminative estimation criterion. We evaluate the improved n-gram on the platform of conversion from Chinese pinyin to Chinese character. The experimental results show that the error rate could be remarkably reduced by at most 55.2%. Besides, the improved language model can solve the data sparsity problem.
Bibliographic reference. Li, Gongjun / Dong, Na / Ishikawa, Toshiro (2000): "Combination of different n-grams based on their different assumptions", In ICSLP-2000, vol.3, 358-361.