Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


A Fast Multilingual Probabilistic Tagger

Evangelos Dermatas, George Kokkinakis

Department of Electrical Engineering, Wire Communications Lab., University of Patras, Patras, Greece

This paper presents and compares two versions of a novel automatic tagging system which is both language and tagset independent and has close to real-time response in personal computers. The system's prediction model is based on the HMMchain theory and tags each word of a text, which includes also unknown words, using the Viterbi algorithm. The first version carries out floating-point arithmetic operations while the second version these operations have been transformed to fixed-point ones. Thus a significant time response reduction is achieved with negligible influence ( <0.01%) on the prediction accuracy. The tagging system was tested on newspaper texts of 7 European languages using various sets of grammatical categories and texts with and without unknown words. The results proved to be satisfactory.

Keywords: Probabilistic tagging, taggers, Viterbi algorithm, HMM, natural language processing.

Full Paper

Bibliographic reference.  Dermatas, Evangelos / Kokkinakis, George (1993): "A fast multilingual probabilistic tagger", In EUROSPEECH'93, 1323-1326.