Synthesizing sports commentaries: One or several emphatic stresses?

Sandrine Brognaux, Thomas Drugman, Marco Saerens


Emphatic stresses are known to fulfill essential functions in expressive speech. Their integration in speech synthesis usually relies on a prosodic annotation of the training corpus. Emphasized syllables are then assigned a single label or can receive several labels according to their acoustic realization. While it is more complex to predict those various labels for a new text to synthesize, it might allow for a better rendering of the stress in the synthesized speech. This paper examines whether the use of more than one emphatic label improves the perceived expressivity of the synthesized speech. It relies on a manually-annotated expressive corpus of sports commentaries. Statistical acoustic analyses show that four distinct realizations of emphatic stresses can be distinguished. However, perceptual tests indicate that the integration of this distinction in HMM-based speech synthesis does not lead to a significant improvement in expressivity. This seems to imply that the different acoustic realizations of the stress are not required to be explicitly annotated in the training corpus.


 DOI: 10.21437/SpeechProsody.2014-41

Cite as: Brognaux, S., Drugman, T., Saerens, M. (2014) Synthesizing sports commentaries: One or several emphatic stresses?. Proc. 7th International Conference on Speech Prosody 2014, 270-274, DOI: 10.21437/SpeechProsody.2014-41.


@inproceedings{Brognaux2014,
  author={Sandrine Brognaux and Thomas Drugman and Marco Saerens},
  title={{Synthesizing sports commentaries: One or several emphatic stresses?}},
  year=2014,
  booktitle={Proc. 7th International Conference on Speech Prosody 2014},
  pages={270--274},
  doi={10.21437/SpeechProsody.2014-41},
  url={http://dx.doi.org/10.21437/SpeechProsody.2014-41}
}