Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Semantic Annotation of the French Media Dialog Corpus

H. Bonneau-Maynard (1), Sophie Rosset (1), C. Ayache (2), A. Kuhn (2), Djamel Mostefa (2)

(1) LIMSI-CNRS, Orsay, France; (2) ELDA, Paris, France

The French Technolangue MEDIA-EVALDA project aims to evaluate spoken understanding approaches. This paper describes the semantic annotation scheme of a common dialog corpus which will be used for developing and evaluating spoken understanding models and for linguistic studies. A common semantic representation has been formalized and agreed upon by the consortium. Each utterance is divided into semantic segments and each segment is annotated with a 5-tuplet containing the mode, attribute name representing the underlying concept, normalized form of the attribute, list of related segments, and an optional comment about the annotation. Periodic inter-annotator agreement studies demonstrate that the annotation are of good quality, with an agreement of almost 90% on mode and attribute identification. An analysis of the semantic content of 12292 annotated client utterances shows that only 14.1% of the observed attributes are domain-dependent and that the semantic dictionary ensures a good coverage of the task.

Full Paper

Bibliographic reference.  Bonneau-Maynard, H. / Rosset, Sophie / Ayache, C. / Kuhn, A. / Mostefa, Djamel (2005): "Semantic annotation of the French media dialog corpus", In INTERSPEECH-2005, 3457-3460.