International Workshop on Spoken Language Translation (IWSLT) 2012
For current statistical machine translation system, reordering is still a major problem for language pairs like Chinese- English, where the source and target language have significant word order differences. In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. For the given source sentence, we assign each source token a label which contains the reordering information for that token. We also design an unaligned word tag so that the unaligned word phenomenon is automatically implanted in the proposed model. Our reordering model is conditioned on the whole source sentence. Hence it is able to catch the long dependency in the source sentence. Although the learning on large scale task requests notably amounts of computational resources, the decoder makes use of the tagging information as soft constraints. Therefore, the training procedure of our model is computationally expensive for large task while in the test phase (during translation) our model is very efficient. We carried out experiments on five Chinese-English NIST tasks trained with BOLT data. Results show that our model improves the baseline system by 1.32 BLEU 1.53 TER on average.
Bibliographic reference. Feng, Minwei / Peter, Jan-Thorsten / Ney, Hermann (2012): "Sequence labeling-based reordering model for phrase-based SMT", In IWSLT-2012, 260-267.