ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Problems of creating a flexible e-mail reader for hungarian

Géza Németh, Csaba Zainkó, Gábor Olaszy, Gábor Prószéky

The problems found during the development of a Hungarian e-mail reader are reported in this paper. Hungarian is special on one hand because of the use of diacritics for several vowels (á, é, í, ó, ö, õ, ú, ü, û), on the other hand because of the ag-glutinative nature of the language, which greatly increases the number of possible valid word forms. Emphasis is placed on text processing related issues, e.g. language detection, dia-critic regeneration from stripped down 7bit ASCII forms, etc. Test results for different solutions on real-life e-mail data are also presented.


doi: 10.21437/Eurospeech.1999-229

Cite as: Németh, G., Zainkó, C., Olaszy, G., Prószéky, G. (1999) Problems of creating a flexible e-mail reader for hungarian. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 939-942, doi: 10.21437/Eurospeech.1999-229

@inproceedings{nemeth99_eurospeech,
  author={Géza Németh and Csaba Zainkó and Gábor Olaszy and Gábor Prószéky},
  title={{Problems of creating a flexible e-mail reader for hungarian}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={939--942},
  doi={10.21437/Eurospeech.1999-229}
}