5th International Conference on Spoken Language Processing
In this paper the main differences between language modelling of Russian and English are examined. A Russian corpus and a comparable English corpus are described. The effects of high inflectionality in Russian and the relationship between the out-of-vocabulary rate and vocabulary size are investigated. Standard word and class N-gram language modelling techniques are applied to the two corpora and perplexity results are reported. A novel approach to the modelling of inflected languages is proposed and its efficacy compared with the other techniques.
Bibliographic reference. Whittaker, Edward W. D. / Woodland, Philip C. (1998): "Comparison of language modelling techniques for Russian and English", In ICSLP-1998, paper 0967.