8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Multilayered Extensions to the Speech Synthesis Markup Language for Describing Expressiveness

E. Eide, R. Bakis, W. Hamza, J. Pitrelli

IBM T.J. Watson Research Center, USA

In this paper we discuss possible extensions to the Speech Synthesis Markup Language (SSML) to facilitate the generation of synthetic expressive speech. The proposed extensions are hierarchical in nature, allowing specification in terms of physical parameters such as instantaneous pitch, higher-level parameters such as ToBI labels, or abstract concepts such as emotions. Low-level tags tend to change their values frequently, even within a word, while the more abstract tags generally apply to whole words, sentences or paragraphs. We envision interfaces at different levels to serve different types of users; speech experts may want to use low-level interfaces while artists may prefer to interface with the TTS system at more abstract levels.

Full Paper

Bibliographic reference.  Eide, E. / Bakis, R. / Hamza, W. / Pitrelli, J. (2003): "Multilayered extensions to the speech synthesis markup language for describing expressiveness", In EUROSPEECH-2003, 1645-1648.