In this paper, we convert the pitch contours predicted by a TTS system that models a source speaker to resemble the pitch contours of a target speaker. When the speaking styles of the speakers are very different, complex conversions such as adding or deleting pitch peaks may be required. Our method does the conversions by modeling the direct pitch features and differential pitch features at the same time based on linguistic features. The differential pitch features are calculated from matched pairs of source and target pitch values. We show experimental results in which the target speakerís characteristics are successfully modeled based on a very limited training corpus. The proposed pitch conversion method stretches the possibilities of TTS customization for various speaking styles.
Bibliographic reference. Tachibana, Ryuki / Shuang, Zhiwei / Nishimura, Masafumi (2009): "Japanese pitch conversion for voice morphing based on differential modeling", In INTERSPEECH-2009, 2651-2654.