13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Evaluating Prosodic Processing for Incremental Speech Synthesis

Timo Baumann (1), David Schlangen (2)

(1) Department of Informatics, University of Hamburg, Germany
(2) Faculty of Linguistics and Literary Studies, Bielefeld University, Germany

Incremental speech synthesis (iSS) accepts input and produces output in consecutive chunks that only together result in a full utterance. Systems that use iSS thus have the ability to adapt their utterances while they are ongoing. Having available less than the full utterance to plan the acoustic realisation has downsides, however, as global optimisation is not possible anymore. In this paper we present a strategy for incrementalizing the symbolic pre-processing component of speech synthesis and assess the influence of a reduction in "lookahead", i. e. in knowledge about the rest of the utterance, on prosodic quality. We found that high quality incremental output can be achieved even with a lookahead of slightly less than one phrase, allowing for timely system reaction.

Index Terms: speech synthesis, spoken dialogue systems, incrementality, prosody

Full Paper

Audio Examples
control    trivial
w0    w1    w2    w3
wn    wn1

Bibliographic reference.  Baumann, Timo / Schlangen, David (2012): "Evaluating prosodic processing for incremental speech synthesis", In INTERSPEECH-2012, 438-441.