Automatic word classification of a French corpus (2M words) has been performed, without any grammatical or semantic assumption. Training texts and test sets are both extracted from the newspaper "Le Monde". Model assessment is also given showing the role of test set size compared to the training set size.
Keywords: Language Model, Simulated Annealing Process, Large Vocabulary Speech Recognition.
Bibliographic reference. Jardino, Michele / Adda, Gilles (1993): "Language modelling for CSR of large corpus using automatic classification of words", In EUROSPEECH'93, 1191-1194.