Key contextual information such as word prominence, emphasis, and contrast is typically ignored in speech-to-speech (S2S) translation due to the compartmentalized nature of the translation process. Conventional S2S systems rely on extracting prosody dependent cues from hypothesized (possibly erroneous) translation output using only words and syntax. In contrast, we propose the use of factored translation models to integrate the assignment and transfer of pitch accents (tonal prominence) during translation. We report experiments on 2 parallel corpora (Farsi-English and Japanese-English). The proposed factored translation models provide a relative improvement of 8.4% and 16.8% in pitch accent labeling accuracy over the post-processing approach for the two corpora respectively.
Bibliographic reference. Sridhar, Vivek Kumar Rangarajan / Bangalore, Srinivas / Narayanan, Shrikanth S. (2008): "Factored translation models for enriching spoken language translation with prosody", In INTERSPEECH-2008, 2723-2726.