ISCA Archive SSW 2013
ISCA Archive SSW 2013

Towards speaking style transplantation in speech synthesis

Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Junichi Yamagishi, Oliver Watts, Juan Manuel Montero

One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality expressive models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and political speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and expressive average models can be learned and used to imbue expressiveness into target neutral speakers as intended.

Index Terms: expressive speech synthesis, speaking styles, adaptation, expressiveness transplantation

Cite as: Lorenzo-Trueba, J., Barra-Chicote, R., Yamagishi, J., Watts, O., Montero, J.M. (2013) Towards speaking style transplantation in speech synthesis. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 159-163

  author={Jaime Lorenzo-Trueba and Roberto Barra-Chicote and Junichi Yamagishi and Oliver Watts and Juan Manuel Montero},
  title={{Towards speaking style transplantation in speech synthesis}},
  booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)},