EUROSPEECH 2003 - INTERSPEECH 2003
The W3C Speech Synthesis Markup Language (SSML) unifies a number of recent related markup languages that have emerged to fill the perceived need for increased, and standardized, user control over Text to Speech (TTS) engines. One of the main drivers for markup has been the increasing use of TTS engines as embedded components of specific applications - which means they are in a position to take advantage of additional knowledge about the text. Although SSML allows improved control over the text normalization process, most of the attention has focused on the level of prosody markup, especially since the prediction of the prosody is generally acknowledged as one of the most significant problems in TTS synthesis. Prosody control is by no means simple due to the large cross-dependency between other related aspects of prosody. Prosody control is also of particular complexity for concatenative TTS systems. SSML is about much more than prosody control though - allowing high level engine control such as language switching and voice switching, and low level control such as phonetic input for words. Our experiences in implementing these diverse requirements of the SSML standard are discussed.
Bibliographic reference. Breen, Andrew P. / Minnis, Steve / Eggleton, Barry (2003): "Implementing an SSML compliant concatenative TTS system", In EUROSPEECH-2003, 2425-2428.