5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Towards a Formal Framework for Linguistic Annotations

Steven Bird, Mark Liberman

LDC, University of Pennsylvania, USA

`Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of annotation formats and demonstrate a common conceptual core. This provides the foundation for an algebraic framework which encompasses the representation, archiving and query of linguistic annotations, while remaining consistent with many alternative file formats.

Full Paper

Bibliographic reference.  Bird, Steven / Liberman, Mark (1998): "Towards a formal framework for linguistic annotations", In ICSLP-1998, paper 0774.