ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Comparison of language modelling techniques for Russian and English

Edward W. D. Whittaker, Philip C. Woodland

In this paper the main differences between language modelling of Russian and English are examined. A Russian corpus and a comparable English corpus are described. The effects of high inflectionality in Russian and the relationship between the out-of-vocabulary rate and vocabulary size are investigated. Standard word and class N-gram language modelling techniques are applied to the two corpora and perplexity results are reported. A novel approach to the modelling of inflected languages is proposed and its efficacy compared with the other techniques.


doi: 10.21437/ICSLP.1998-676

Cite as: Whittaker, E.W.D., Woodland, P.C. (1998) Comparison of language modelling techniques for Russian and English. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0967, doi: 10.21437/ICSLP.1998-676

@inproceedings{whittaker98b_icslp,
  author={Edward W. D. Whittaker and Philip C. Woodland},
  title={{Comparison of language modelling techniques for Russian and English}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0967},
  doi={10.21437/ICSLP.1998-676}
}