ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Factored translation models for enriching spoken language translation with prosody

Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, Shrikanth S. Narayanan

Key contextual information such as word prominence, emphasis, and contrast is typically ignored in speech-to-speech (S2S) translation due to the compartmentalized nature of the translation process. Conventional S2S systems rely on extracting prosody dependent cues from hypothesized (possibly erroneous) translation output using only words and syntax. In contrast, we propose the use of factored translation models to integrate the assignment and transfer of pitch accents (tonal prominence) during translation. We report experiments on 2 parallel corpora (Farsi-English and Japanese-English). The proposed factored translation models provide a relative improvement of 8.4% and 16.8% in pitch accent labeling accuracy over the post-processing approach for the two corpora respectively.


doi: 10.21437/Interspeech.2008-675

Cite as: Sridhar, V.K.R., Bangalore, S., Narayanan, S.S. (2008) Factored translation models for enriching spoken language translation with prosody. Proc. Interspeech 2008, 2723-2726, doi: 10.21437/Interspeech.2008-675

@inproceedings{sridhar08_interspeech,
  author={Vivek Kumar Rangarajan Sridhar and Srinivas Bangalore and Shrikanth S. Narayanan},
  title={{Factored translation models for enriching spoken language translation with prosody}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2723--2726},
  doi={10.21437/Interspeech.2008-675}
}