ISCA Archive SpeechProsody 2022
ISCA Archive SpeechProsody 2022

Can Prosody Transfer Embeddings be Used for Prosody Assessment?

Mariana Julião, Alberto Abad, Helena Moniz

In voice conversion, it is possible to transfer some characteristic components of a (target) speech utterance, such as the content, pitch, or speaker identity, from the corresponding component from another (source) utterance. This has recently been achieved by characterizing these components through neural-based vector embeddings which encode the specific information to be transferred. In the particular case of neural prosody embeddings, to the best of our knowledge, no work has explored the informativeness of these embeddings for other purposes, such as prosody assessment or comparison of prosodic patterns. In this work, we use an intonation data set and a voice conversion corpus to explore how these neural prosody embeddings group for utterances of different intonation, content, and speaker identity. We compare these neural prosody embeddings to hand-crafted acoustic-prosodic features and to content embeddings. We found that neural prosody embeddings can achieve a geometrical separability index as high as 0.956 for highly contrastive intonations, and 0.706 for different sentence types.

doi: 10.21437/SpeechProsody.2022-60

Cite as: Julião, M., Abad, A., Moniz, H. (2022) Can Prosody Transfer Embeddings be Used for Prosody Assessment? Proc. Speech Prosody 2022, 292-296, doi: 10.21437/SpeechProsody.2022-60

  author={Mariana Julião and Alberto Abad and Helena Moniz},
  title={{Can Prosody Transfer Embeddings be Used for Prosody Assessment?}},
  booktitle={Proc. Speech Prosody 2022},