ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Semantic annotation of the French media dialog corpus

H. Bonneau-Maynard, Sophie Rosset, C. Ayache, A. Kuhn, Djamel Mostefa

The French Technolangue MEDIA-EVALDA project aims to evaluate spoken understanding approaches. This paper describes the semantic annotation scheme of a common dialog corpus which will be used for developing and evaluating spoken understanding models and for linguistic studies. A common semantic representation has been formalized and agreed upon by the consortium. Each utterance is divided into semantic segments and each segment is annotated with a 5-tuplet containing the mode, attribute name representing the underlying concept, normalized form of the attribute, list of related segments, and an optional comment about the annotation. Periodic inter-annotator agreement studies demonstrate that the annotation are of good quality, with an agreement of almost 90% on mode and attribute identification. An analysis of the semantic content of 12292 annotated client utterances shows that only 14.1% of the observed attributes are domain-dependent and that the semantic dictionary ensures a good coverage of the task.

doi: 10.21437/Interspeech.2005-312

Cite as: Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., Mostefa, D. (2005) Semantic annotation of the French media dialog corpus. Proc. Interspeech 2005, 3457-3460, doi: 10.21437/Interspeech.2005-312

  author={H. Bonneau-Maynard and Sophie Rosset and C. Ayache and A. Kuhn and Djamel Mostefa},
  title={{Semantic annotation of the French media dialog corpus}},
  booktitle={Proc. Interspeech 2005},