15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Cross-Language Transfer of Semantic Annotation via Targeted Crowdsourcing

Shammur Absar Chowdhury (1), Arindam Ghosh (1), Evgeny A. Stepanov (1), Ali Orkan Bayer (1), Giuseppe Riccardi (1), Ioannis Klasinas (2)

(1) Università di Trento, Italy
(2) Technical University of Crete, Greece

The development of a natural language speech application requires the process of semantic annotation. Moreover multilingual porting of speech applications increases the cost and complexity of the annotation task. In this paper we address the problem of transferring the semantic annotation of the source language corpus to a low-resource target language via crowdsourcing. The current crowdsourcing approach faces several problems. First, the available crowdsourcing platforms have skewed distribution of language speakers. Second, speech applications require domain-specific knowledge. Third, the lack of reference target language annotation, makes crowdsourcing worker control very difficult. In this paper we address these issues on the task of cross-language transfer of domain-specific semantic annotation from an Italian spoken language corpus to Greek, via targeted crowdsourcing. The issue of domain knowledge transfer is addressed by priming the workers with the source language concepts. The lack of reference annotation is coped with a consensus-based annotation algorithm. The quality of annotation transfer is assessed using source language references and inter-annotator agreement. We demonstrate that the proposed computational methodology is viable and achieves acceptable annotation quality.

Full Paper

Bibliographic reference.  Chowdhury, Shammur Absar / Ghosh, Arindam / Stepanov, Evgeny A. / Bayer, Ali Orkan / Riccardi, Giuseppe / Klasinas, Ioannis (2014): "Cross-language transfer of semantic annotation via targeted crowdsourcing", In INTERSPEECH-2014, 2108-2112.