ISCA Archive SSW 2023
ISCA Archive SSW 2023

Using a Large Language Model to Control Speaking Style for Expressive TTS

Atli Thor Sigurgeirsson, Simon King

Large generative language models have been used to solve various language-related tasks. We explore whether such models can suggest appropriate prosody for expressive TTS. We train a TTS model and then prompt the language model to suggest appropriate changes to pitch, energy and duration. The prompt can be designed for any task and we prompt the model to make suggestions based on target speaking style and dialogue context. The proposed method is rated most appropriate in 49.9% of cases compared to 31.0% for a baseline model


Cite as: Sigurgeirsson, A.T., King, S. (2023) Using a Large Language Model to Control Speaking Style for Expressive TTS . Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 246-247

@inproceedings{sigurgeirsson23_ssw,
  author={Atli Thor Sigurgeirsson and Simon King},
  title={{Using a Large Language Model to Control Speaking Style for Expressive TTS }},
  year=2023,
  booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)},
  pages={246--247}
}