International Workshop on
Spoken Language Translation (IWSLT) 2004
Keihanna Science City, Kyoto, Japan
September 30-October 1, 2004
Toward the Evaluation of Speech Translation (Panel Discussion)
Marcello Federico (1), Young-suk Lee (2), Hermann Ney (2), Stephan Vogel (2)
(1) Moderator; (2) Panelists
The evaluation of conversational-speech translation systems rises many technical
issues. For the sake of stimulating the discussion, some general problems and
proposals are briefly introduced, which will be integrated with the presentations given
by the invited panelists.
- Speech translation requires carefully considering the goal of the task itself.
While, e.g., broadcast news translation can be treated similarly to written text
translation, different ideas of translation could be considered for conversational
speech. For this task, humans professional translators typically refer to three
"interpreting modalities": simultaneous, consecutive and liason. Simply speaking,
all modalities require the human interpreter to listen to a given amount of
speech, to recount what has been said, to listen again, and so on. Probably, the
less ambitious scenario for automatic SLT might be the one of simultaneous
interpreting, which typically requires the human to translate at very short
intervals, e.g. few seconds, or even in real-time. Besides being physically very
demanding, simultaneous interpreters, due to the strict time constraints, are less
able to exploit their linguistic and domain knowledge. Both reasons make users
accept less fluent and almost close to literal translations.
- Given that speech translation relies on automatic speech recognition (ASR), the
task should be tailored to the affordable ASR accuracy. In the past,
interlingua-based systems have been applied to resemble the way a liason
interpreter works, e.g. at a meeting or appointment. In particular, the interpreter
is assumed to be familiar with the subject under discussion and uses
psychological skills to facilitate communication. While the mediator metaphor
seemed appropriate, especially in the presence of noisy input, interlingua
approaches have shown little ability to cope with poor speech recognition
performance, and to work significantly worse than purely data-driven translation
models. Nevertheless, any plan for speech translation evaluation should take
into account progress in the area of speech recognition and scale up difficulty of
the considered tasks accordingly.
Human and automatic evaluation should take into account important differences
between written and spoken language. Practically, how should input sentences
containing disfluencies and syntactic errors be treated? what kind of human
translations should be taken as target references? The simultaneous interpreting
scenario would suggest to put more emphasis on adequacy rather than fluency.
Moreover, appropriate reference translations could be obtained by transcribing
human interpreters working in realistic conditions.
Federico, Marcello / Lee, Young-suk / Ney, Hermann / Vogel, Stephan (2004):
"Toward the evaluation of speech translation (panel discussion)",