We present a study on the relation between fundamental frequency (F0) and its perceptual effect in the context of text-to-speech (TTS) synthesis. Features that essentially capture the intonational (macro-prosodic) properties of spoken speech are introduced and analysed with regard to the following questions: (i) How does the prosodic variation of TTS signals differ from natural speech? (ii) Is there a functional relationship between the prosodic variation of TTS signals and its perceived quality? In answering these questions we present novel approaches for the construction of non-intrusive quality estimators. The results reveal a substantial degree of systematic influence of prosodic variation on TTS quality.
Index Terms: Speech quality, instrumental quality assessment, text-to-speech (TTS), prosody.
Bibliographic reference. Norrenbrock, Christoph R. / Hinterleitner, Florian / Heute, Ulrich / Möller, Sebastian (2012): "Quality analysis of macroprosodic F0 dynamics in text-to-speech signals", In INTERSPEECH-2012, 454-457.