Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


Language Modelling for CSR of Large Corpus Using Automatic Classification of Words

Michele Jardino, Gilles Adda

LIMSI-CNRS, BP133,, Orsay, France

Automatic word classification of a French corpus (2M words) has been performed, without any grammatical or semantic assumption. Training texts and test sets are both extracted from the newspaper "Le Monde". Model assessment is also given showing the role of test set size compared to the training set size.

Keywords: Language Model, Simulated Annealing Process, Large Vocabulary Speech Recognition.

Full Paper

Bibliographic reference.  Jardino, Michele / Adda, Gilles (1993): "Language modelling for CSR of large corpus using automatic classification of words", In EUROSPEECH'93, 1191-1194.