8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Building an Information Retrieval System for Serbian - Challenges and Solutions

Miroslav Martinović (1), Srdjan Vesić (2), Goran Rakić (2)

(1) College of New Jersey, USA
(2) University of Belgrade, Serbia

We describe challenges encountered while building an information retrieval system for Serbian language. Approaches designed and adopted to handle them are depicted and illuminated in this paper. As a backbone of our system, we used SMART retrieval system which we augmented with features necessary to deal with specificities of the Serbian alphabet. In addition, morphological richness of the language accentuated implications of the text preprocessing phase. During this phase, we devised two algorithms which increased retrieval precision by 14% and 27%, respectively. Testing was conducted using two gigabyte EBART collection of Serbian newspaper articles.

Full Paper

Bibliographic reference.  Martinović, Miroslav / Vesić, Srdjan / Rakić, Goran (2007): "Building an information retrieval system for Serbian - challenges and solutions", In INTERSPEECH-2007, 1513-1516.