Speech-to-speech (S2S) translation has been of increased interest in the last few years with the research focused mainly on lexical aspects. It has however been widely acknowledged that incorporating other rich information such as expressive prosody contained in speech can enhance the cross-lingual communication experience. Motivated by recent empirical findings showing a positive relation between the transfer of emphasis and the quality of the audio translation, we propose a computational method to derive a set of acoustic cues that can be used in transferring emphasis for the English- Spanish language pair. In particular, we present an iterative algorithm that aims to discover the set of acoustic cue pairs in the two languages that maximize the accurate transfer of emphasis. We find that the relevant acoustic cues can be constructed from a diverse set of features including word/phrase level statistics of spectral, intensity and prosodic cues and can model the acoustic information related to emphasized and neutral words/phrases for the English-Spanish language pair. These features can in turn enable data-driven transformations from source to target language that preserve such rich prosodic information. We demonstrate the efficacy of this approach through experiments on a specially constructed corpus of 1800 English-Spanish words/phrases.
Bibliographic reference. Tsiartas, Andreas / Georgiou, Panayiotis G. / Narayanan, Shrikanth (2013): "Toward transfer of acoustic cues of emphasis across languages", In INTERSPEECH-2013, 3483-3486.