This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Cross-lingual speech synthesis based on voice conversion or HMM-based speech synthesis, which synthesizes foreign language speech of a specific non-native speaker reflecting the speaker-dependent acoustic characteristics extracted from the speaker's natural speech in his/her mother tongue, tends to cause a degradation of speaker individuality in synthetic speech compared to intra-lingual speech synthesis. This paper proposes a new approach to cross-lingual speech synthesis that preserves speaker individuality by explicitly using non-native speech spoken by the target speaker. Although the use of non-native speech makes it possible to preserve the speaker individuality in the synthesized target speech, naturalness is significantly degraded as the speech is directly affected by unnatural prosody and pronunciation often caused by differences in the linguistic systems of the source and target languages. To improve naturalness while preserving speaker individuality, we propose (1) a prosodic correction method based on model adaptation, and (2) a phonetic correction method based on spectrum replacement for unvoiced consonants. The experimental results demonstrate that these proposed methods are capable of significantly improving naturalness while preserving the speaker individuality in synthetic speech.
Bibliographic reference. Oshima, Yuji / Takamichi, Shinnosuke / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2015): "Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics", In INTERSPEECH-2015, 299-303.