Sixth European Conference on Speech Communication and Technology
The problems found during the development of a Hungarian e-mail reader are reported in this paper. Hungarian is special on one hand because of the use of diacritics for several vowels (á, é, í, ó, ö, õ, ú, ü, û), on the other hand because of the ag-glutinative nature of the language, which greatly increases the number of possible valid word forms. Emphasis is placed on text processing related issues, e.g. language detection, dia-critic regeneration from stripped down 7bit ASCII forms, etc. Test results for different solutions on real-life e-mail data are also presented.
Full Paper (PDF)
Bibliographic reference. Németh, Géza / Zainkó, Csaba / Olaszy, Gábor / Prószéky, Gábor (1999): "Problems of creating a flexible e-mail reader for hungarian", In EUROSPEECH'99, 939-942.