13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Quality Analysis of Macroprosodic F0 Dynamics in Text-to-Speech Signals

Christoph R. Norrenbrock (1), Florian Hinterleitner (2), Ulrich Heute (1), Sebastian Möller (2)

(1) Digital Signal Processing and System Theory, Christian-Albrechts-Universität zu Kiel, Germany
(2) Quality and Usability Lab, Telekom Innovation Laboratories, TU Berlin, Germany

We present a study on the relation between fundamental frequency (F0) and its perceptual effect in the context of text-to-speech (TTS) synthesis. Features that essentially capture the intonational (macro-prosodic) properties of spoken speech are introduced and analysed with regard to the following questions: (i) How does the prosodic variation of TTS signals differ from natural speech? (ii) Is there a functional relationship between the prosodic variation of TTS signals and its perceived quality? In answering these questions we present novel approaches for the construction of non-intrusive quality estimators. The results reveal a substantial degree of systematic influence of prosodic variation on TTS quality.

Index Terms: Speech quality, instrumental quality assessment, text-to-speech (TTS), prosody.

Full Paper

Bibliographic reference.  Norrenbrock, Christoph R. / Hinterleitner, Florian / Heute, Ulrich / Möller, Sebastian (2012): "Quality analysis of macroprosodic F0 dynamics in text-to-speech signals", In INTERSPEECH-2012, 454-457.