This paper presents an empirical study on the annotation of discourse units in spoken dialogues. The goal of this research is to examine whether task-oriented human-human dialogues can be structured as sequences of a small number of individual discourse segments that can be reliably end-pointed. The data used for this study is a corpus of 18 orthographic, transcriptions of actual telephone conversations between customers and travel agents or Yellow Pages operators. We propose the use of a general agreement metric derived from the kappa coefficient and we apply it to measuring the level of agreement among human coders in bracketing discourse segments. Despite the apparent difficulty of this annotation task, we show that a level of agreement around 60% can be reached among at least three out of five coders with variable levels of expertise, using a minimal and theory-neutral set of annotation instructions.
Bibliographic reference. Flammia, Giovanni / Zue, Victor (1995): "Empirical evaluation of human performance and agreement in parsing discourse constituents in spoken dialogue", In EUROSPEECH-1995, 1965-1968.