Incremental speech synthesis (iSS) accepts input and produces output in consecutive chunks that only together result in a full utterance. Systems that use iSS thus have the ability to adapt their utterances while they are ongoing. Having available less than the full utterance to plan the acoustic realisation has downsides, however, as global optimisation is not possible anymore. In this paper we present a strategy for incrementalizing the symbolic pre-processing component of speech synthesis and assess the influence of a reduction in "lookahead", i. e. in knowledge about the rest of the utterance, on prosodic quality. We found that high quality incremental output can be achieved even with a lookahead of slightly less than one phrase, allowing for timely system reaction.
Index Terms: speech synthesis, spoken dialogue systems, incrementality, prosody
w0 w1 w2 w3
Bibliographic reference. Baumann, Timo / Schlangen, David (2012): "Evaluating prosodic processing for incremental speech synthesis", In INTERSPEECH-2012, 438-441.