Phonetically Aware Exemplar-Based Prosody Transformation

Berrak Sisman, Grandee Lee, Haizhou Li


In this paper, we propose a novel prosody transformation framework for voice conversion by making use of phonetic information. The proposed framework is motivated by two observations. Firstly, the phonetic prosody is an important aspect of speech prosody, that is influenced by the phonetic content of utterances. We propose the use of phone-dependent dictionaries, or phonetic dictionary, that allows for effective phonetic prosody conversion. Secondly, in the traditional exemplar-based sparse representation frameworks, the estimated activation matrix highly depends on the source speech that is not the best for generating target speech. We propose to incorporate Phonetic PosteriorGrams (PPGs), that represent frame-level phonetic information, as part of the exemplars of the dictionaries. As the exemplars now consist of PPGs that are expected to be speaker-independent, the resulting activation matrix depends less on the source speaker, thus represents a better transformation function for prosody transformation. The experiments show that the proposed prosody transformation framework outperforms the traditional frameworks in both objective and subjective evaluations.


 DOI: 10.21437/Odyssey.2018-38

Cite as: Sisman, B., Lee, G., Li, H. (2018) Phonetically Aware Exemplar-Based Prosody Transformation . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 267-274, DOI: 10.21437/Odyssey.2018-38.


@inproceedings{Sisman2018,
  author={Berrak Sisman and Grandee Lee and Haizhou Li},
  title={Phonetically Aware Exemplar-Based Prosody Transformation	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={267--274},
  doi={10.21437/Odyssey.2018-38},
  url={http://dx.doi.org/10.21437/Odyssey.2018-38}
}