International Workshop on Spoken Language Translation (IWSLT) 2006
Keihanna Science City, Kyoto, Japan
The language model of the target language plays an important
role in statistical machine translation systems. In this
work, we propose to use a new statistical language model that
is based on a continuous representation of the words in the
vocabulary. A neural network is used to perform the projection
and the probability estimation. This kind of approach is
in particular promising for tasks where a very limited amount
of resources are available, like the BTEC corpus of tourism
This language model is used in two state-of-the-art statistical machine translation systems that were developed by UPC for the 2006 IWSLT evaluation campaign: a phrase- and an n-gram-based approach. An experimental evaluation for four different language pairs is provided (translation of Mandarin, Japanese, Arabic and Italian to English). The proposed method achieved improvements in the BLEU score of up to 3 points on the development data and of almost 2 points on the official test data.
Full Paper Presentation
Bibliographic reference. Schwenk, Holger / Costa-jussà, Marta R. / Fonollosa, José A. R. (2006): "Continuous space language models for the IWSLT 2006 task", In IWSLT-2006, 166-173.