Interspeech'2005 - Eurospeech
The French Technolangue MEDIA-EVALDA project aims to evaluate spoken understanding approaches. This paper describes the semantic annotation scheme of a common dialog corpus which will be used for developing and evaluating spoken understanding models and for linguistic studies. A common semantic representation has been formalized and agreed upon by the consortium. Each utterance is divided into semantic segments and each segment is annotated with a 5-tuplet containing the mode, attribute name representing the underlying concept, normalized form of the attribute, list of related segments, and an optional comment about the annotation. Periodic inter-annotator agreement studies demonstrate that the annotation are of good quality, with an agreement of almost 90% on mode and attribute identification. An analysis of the semantic content of 12292 annotated client utterances shows that only 14.1% of the observed attributes are domain-dependent and that the semantic dictionary ensures a good coverage of the task.
Bibliographic reference. Bonneau-Maynard, H. / Rosset, Sophie / Ayache, C. / Kuhn, A. / Mostefa, Djamel (2005): "Semantic annotation of the French media dialog corpus", In INTERSPEECH-2005, 3457-3460.