INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Portability of Semantic Annotations for Fast Development of Dialogue Corpora

Bassam Jabaian (1,2), Fabrice Lefèvre (1), Laurent Besacier (2)

(1) LIA, University of Avignon, Avignon, France
(2) LIG, University of Joseph Fourrier, Grenoble, France

Generalization of spoken dialogue systems increases the need for fast development of spoken language understanding modules for semantic tagging of speakerfs turns. Statistical methods are performing well for this task but require large corpora to be trained. Collecting such corpora is expensive in time and human expertise. In this paper we propose a semi automatic annotation process for fast production of dialogue corpora. The approach consists in automatically pre-annotating the corpus and then manually correct the annotation. To perform the preannotation we propose to port an existing corpus and to adapt it to the new data. The French MEDIA dialogue corpus is used as a starting point to produce two new corpora: one for a new language (Italian) and another for a new domain (theatre ticket reservation). We show that the automatic pre-annotation leads to a significant gain in productivity compared to a fully manual annotation and thus allow to derive new adaptation data which can be used to further improve the systems.

Index Terms: Spoken Dialogue Systems, Spoken Language Understanding, Language Portability, Statistical Machine Translation

Full Paper

Bibliographic reference.  Jabaian, Bassam / Lefèvre, Fabrice / Besacier, Laurent (2012): "Portability of semantic annotations for fast development of dialogue corpora", In INTERSPEECH-2012, 214-217.