Prosody Prediction from Syntactic, Lexical, and Word Embedding Features

Rose Sloan, Syed Sarfaraz Akhtar, Bryan Li, Ritvik Shrivastava, Agustin Gravano, Julia Hirschberg

Accurate prosody prediction from text leads to more natural-sounding TTS. In this work, we employ a new set of features to predict ToBI pitch accent and phrase boundaries from text. We investigate a wide variety of text-based features, including many new syntactic features, several types of word embeddings, co-reference features, LIWC features, and specificity information. We focus our work on the Boston Radio News Corpus, a ToBI-labeled corpus of relatively clean news broadcasts, but also test our classifiers on Audix, a smaller corpus of read news, and on the Columbia Games Corpus, a corpus of conversational speech, in order to test the applicability of our model in cross-corpus settings. Our results show strong performance on both tasks, as well as some promising results for cross-corpus applications of our models.

 DOI: 10.21437/SSW.2019-48

Cite as: Sloan, R., Akhtar, S.S., Li, B., Shrivastava, R., Gravano, A., Hirschberg, J. (2019) Prosody Prediction from Syntactic, Lexical, and Word Embedding Features. Proc. 10th ISCA Speech Synthesis Workshop, 269-274, DOI: 10.21437/SSW.2019-48.

  author={Rose Sloan and Syed Sarfaraz Akhtar and Bryan Li and Ritvik Shrivastava and Agustin Gravano and Julia Hirschberg},
  title={{Prosody Prediction from Syntactic, Lexical, and Word Embedding Features}},
  booktitle={Proc. 10th ISCA Speech Synthesis Workshop},