Paragraph Prosodic Patterns to Enhance Text-to-Speech Naturalness

Alex Peiró-Lilja, Mireia Farrús

Speech synthesis has reached a reasonable high quality in recent years. However, there is still room for improvement in terms of naturalness and expressiveness when dealing with large multi-sentential discourse, since most text-to-speech synthesizers do not fully take into account the prosodic differences that have been observed in discourse units such as paragraphs. This work presents an implementation of paragraph-based prosodic patterns into the open-source MARYTTS platform, enriching its prosody output by means of intra- and inter-paragraph prosodic features. The set of characteristics include pitch decay, pitch range and speech rate variation (as intra-paragraph features), as well as paragraph break pauses and speech rate variation (as inter-paragraph features), previously analyzed in a large set of TED Talks and read-speech sections of the Spoken Wikipedia Corpus. The perception tests, performed both in English and German parametric voices, suggest that paragraph-based features should be further studied and taken into account on future implementations to synthesize large discourse speech.

 DOI: 10.21437/SpeechProsody.2018-124

Cite as: Peiró-Lilja, A., Farrús, M. (2018) Paragraph Prosodic Patterns to Enhance Text-to-Speech Naturalness. Proc. 9th International Conference on Speech Prosody 2018, 612-616, DOI: 10.21437/SpeechProsody.2018-124.

  author={Alex {Peiró-Lilja} and Mireia Farrús},
  title={Paragraph Prosodic Patterns to Enhance Text-to-Speech Naturalness},
  booktitle={Proc. 9th International Conference on Speech Prosody 2018},