Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Problems of Creating a Flexible E-mail Reader for Hungarian

Géza Németh (1), Csaba Zainkó (1), Gábor Olaszy (2), Gábor Prószéky (3)

(1) Department of Telecommunications & Telematics, Technical University of Budapest, Hungary
(2) Phonetics Laboratory, Linguistics Institute of the Hungarian Academy of Sciences, Budapest, Hungary
(3) Morphologic, Budapest, Hungary

The problems found during the development of a Hungarian e-mail reader are reported in this paper. Hungarian is special on one hand because of the use of diacritics for several vowels (á, é, í, ó, ö, õ, ú, ü, û), on the other hand because of the ag-glutinative nature of the language, which greatly increases the number of possible valid word forms. Emphasis is placed on text processing related issues, e.g. language detection, dia-critic regeneration from stripped down 7bit ASCII forms, etc. Test results for different solutions on real-life e-mail data are also presented.

Full Paper (PDF)

Bibliographic reference.  Németh, Géza / Zainkó, Csaba / Olaszy, Gábor / Prószéky, Gábor (1999): "Problems of creating a flexible e-mail reader for hungarian", In EUROSPEECH'99, 939-942.