ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Combining word- and class-based language models: a comparative study in several languages using automatic and manual word-clustering techniques

G. Maltese, P. Bravetti, H. Crépy, B. J. Grainger, M. Herzog, F. Palou

This paper compares various class-based language models when used in conjunction with a word-based trigram language model by means of linear interpolation. For class-based language models where classes are automatically derived we present a comparative analysis in five languages (French, British English, German, Italian, and Spanish). With regard to classes corresponding to parts-of-speech, we present results for three languages (British English, French, and Italian). For each language, we present results for varying training corpus size and test script complexity. We achieved significant perplexity and word error rate reduction for all five languages and for several language models and recognition tasks. This work extends previous research by covering more languages and showing positive impact of these techniques with very large corpora, whereas prior work mostly focused on addressing data sparseness issues caused by small corpora.


doi: 10.21437/Eurospeech.2001-5

Cite as: Maltese, G., Bravetti, P., Crépy, H., Grainger, B.J., Herzog, M., Palou, F. (2001) Combining word- and class-based language models: a comparative study in several languages using automatic and manual word-clustering techniques. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 21-24, doi: 10.21437/Eurospeech.2001-5

@inproceedings{maltese01_eurospeech,
  author={G. Maltese and P. Bravetti and H. Crépy and B. J. Grainger and M. Herzog and F. Palou},
  title={{Combining word- and class-based language models: a comparative study in several languages using automatic and manual word-clustering techniques}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={21--24},
  doi={10.21437/Eurospeech.2001-5}
}