![]() |
International Workshop on Spoken Language Translation (IWSLT) 2006Keihanna Science City, Kyoto, Japan |
![]() |
The language model of the target language plays an important
role in statistical machine translation systems. In this
work, we propose to use a new statistical language model that
is based on a continuous representation of the words in the
vocabulary. A neural network is used to perform the projection
and the probability estimation. This kind of approach is
in particular promising for tasks where a very limited amount
of resources are available, like the BTEC corpus of tourism
related questions.
This language model is used in two state-of-the-art statistical
machine translation systems that were developed by
UPC for the 2006 IWSLT evaluation campaign: a phrase- and
an n-gram-based approach. An experimental evaluation for
four different language pairs is provided (translation of Mandarin,
Japanese, Arabic and Italian to English). The proposed
method achieved improvements in the BLEU score of up to
3 points on the development data and of almost 2 points on
the official test data.
Bibliographic reference. Schwenk, Holger / Costa-jussà, Marta R. / Fonollosa, José A. R. (2006): "Continuous space language models for the IWSLT 2006 task", In IWSLT-2006, 166-173.