SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages

Hanoi, Vietnam
May 5-7, 2008

Extending an On-Line Parallel Corpus Management System to Handle Specific Types of Structured Documents

Cong-Phap Huynh, Christian Boitet, Georges Fafiotte

Laboratoire LIG, GETALP, GETA, Université Joseph Fourier, Grenoble, France

Parallel bilingual or multilingual corpora are often handled as collections of segments without any specific document organization. We describe SECTra_w, a web-oriented system which has been used for online MT evaluations, and has recently been extended to handle multimodal documents such as French-Chinese/Vietnamese/Hindi/Tamil interpreted bilingual spontaneous dialogues, mainly spoken but also using some short texts, and multilingual written articles of an online encyclopedia annotated with UNL graphs.

Keywords: parallel corpora, translation memories, multiple annotations, multimodal dialogues, multilingual documents

Full Paper

Bibliographic reference.  Huynh, Cong-Phap / Boitet, Christian / Fafiotte, Georges (2008): "Extending an on-line parallel corpus management system to handle specific types of structured documents", In SLTU-2008, 80-85.