Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Spoken Language Conversion with Accent Morphing

Mark Huckvale, Kayoko Yanagisawa

Department of Phonetics and Linguistics, University College London, London, UK

Spoken language conversion is the challenge of using synthesis systems to generate utterances in the voice of a speaker but in a language unknown to the speaker. Previous approaches have been based on voice conversion and voice adaptation technologies applied to the output of a foreign language TTS system. This inevitably reduces the quality and intelligibility of the output, since the source speaker will not be a good source of phonetic material in the new language. This article contrasts previous work with a new approach that uses two synthesis systems: one in the source speaker's voice, one in the voice of a native speaker of the target language. Audio morphing technology is then exploited to correct the foreign accent of the source speaker, while at the same time trying to maintain his or her identity. In this paper we construct a spoken language conversion system using accent morphing and evaluate its performance in terms of intelligibility. Encouraging results tell us more about the challenges of spoken language conversion.

E  Unmodified English TTS (source)
J  Unmodified Japanese TTS (model)
A  Segmental morphing alone (from J)
P  Pitch morphing alone (from J)
R  Rhythm morphing alone (from J)
PR  Pitch & Rhythm morphing (from J)
APR  Segment, Pitch & Rhythm morphing (from J)
N  Natural Japanese (control)

