8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

IceNLP: A Natural Language Processing Toolkit for Icelandic

Hrafn Loftsson (1), Eiríkur Rögnvaldsson (2)

(1) Reykjavik University, Iceland
(2) University of Iceland, Iceland

Icelandic is a morphologically complex language, for which language technology resources are scarce. Only a few years ago, it could be stated that language technology was practically non-existent in Iceland. In this paper, we describe the development of an NLP toolkit for processing the language, the challenges faced and the decisions made during development. The current version of the toolkit consists of a tokeniser/sentence segmentiser, a morphological analyser, a linguistic rule-based tagger, and a finite-state parser. The development of our toolkit is a step towards building a Basic Language Resource Toolkit (BLARK) for the Icelandic language.

Full Paper

Bibliographic reference.  Loftsson, Hrafn / Rögnvaldsson, Eiríkur (2007): "IceNLP: a natural language processing toolkit for icelandic", In INTERSPEECH-2007, 1533-1536.