![]() |
ESCA Workshop on Spoken Dialogue SystemsVigsų, Denmark |
![]() |
In the area of speech synthesis it is already possible to generate understandable speech with citation form prosody for simple written texts. However at ATR we are researching into speech synthesis techniques for use in a speech translation environment. Dialogues in such conversations involve much richer forms of prosodic variation than are required for the reading of texts. In order for our translations to sound natural it is necessary for our synthesis system to offer a wide range of prosodic variability, which can be described at an appropriate level of abstraction.
This paper describes a multi-level intonation system which generates a fundamental frequency (F0) contour based on input labelled with high level discourse information, including speech act type and focusing information, as well as part of speech and syntactic constituent structure. The system is rule driven but the rules and even some elements of the intonation system are derived from naturally spoken dialogues.
Bibliographic reference. Black, Alan W. / Campbell, Nick (1995): "Predicting the intonation of discourse segments from examples in dialogue speech", In SDS-1995, 197-200.