International Workshop on Spoken Language Translation (IWSLT) 2010
This paper improves our unsupervised method for
extracting parallel sentence pairs from a comparable
corpus presented in . In this former paper, a
translation system was used to mine a comparable
corpus and to detect French-Vietnamese parallel
sentence pairs. An iterative process was implemented
to increase the number of extracted parallel sentence
pairs which improved the overall quality of the
This paper validates the unsupervised approach on a new under-resourced language pair (Vietnamese- English) and it also addresses the problem of using triangulation through a third language to improve the parallel data mining process. An extension of the unsupervised method is proposed to make use of triangulation. Two ways to include the additional data from triangulation are carried out. The experiments conducted on Vietnamese - French show that using triangulation through English can improve the quality of the extracted data and slightly improve the quality of the translation system measured with BLEU.
Bibliographic reference. Diep, Do Thi Ngoc / Besacier, Laurent / Castelli, Eric (2010): "Improved Vietnamese-French parallel corpus mining using English language", In IWSLT-2010, 235-242.