Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

Interactional Adequacy as a Factor in the Perception of Synthesized Speech

Timo Baumann (1), David Schlangen (2)

(1) Universität Hamburg, Germany; (2) Bielefeld University, Germany

Speaking as part of a conversation is different from reading out aloud. Speech synthesis systems, however, are typically developed using assumptions (at least implicitly) that are more true of the latter than the former situation. We address one particular aspect, which is the assumption that a fully formulated sentence is available for synthesis. We have built a system that does not make this assumption but rather can synthesize speech given incrementally extended input. In an evaluation experiment, we found that in a dynamic domain where what is talked about changes quickly, subjects rated the output of this system as more naturally pronounced than that of a baseline system that employed standard synthesis, despite the quality objectively being degraded. Our results highlight the importance of considering a synthesizer’s ability to support interactive use-cases when determining the adequacy of synthesized speech. Index Terms: speech synthesis, incremental processing, interactive behaviour, evaluation, adequacy

