8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Accelerating the Annotation of Lexical Data for Less-Resourced Languages

Gerhard B. van Huyssteen, Martin J. Puttkammer

North-West University, South Africa

The development of digital resources is an expensive and time-consuming endeavor, especially in the case of less-resourced languages. In this paper, we describe a freely available, open-source system, called TurboAnnotate, for bootstrapping linguistic data for machine-learning purposes, or for manually creating gold standards or other annotated lists. A detailed description of the design and functionalities of the tool is given, focusing on how the requirements of end-users are being addressed through it. It is indicated that TurboAnnotate does not only promise to help increase the accuracy of human annotators, but also to save enormously on human effort in terms of time.

Full Paper

Bibliographic reference.  Huyssteen, Gerhard B. van / Puttkammer, Martin J. (2007): "Accelerating the annotation of lexical data for less-resourced languages", In INTERSPEECH-2007, 1505-1508.