`Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of annotation formats and demonstrate a common conceptual core. This provides the foundation for an algebraic framework which encompasses the representation, archiving and query of linguistic annotations, while remaining consistent with many alternative file formats.
Cite as: Bird, S., Liberman, M. (1998) Towards a formal framework for linguistic annotations. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0774
@inproceedings{bird98_icslp, author={Steven Bird and Mark Liberman}, title={{Towards a formal framework for linguistic annotations}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0774} }