ISCA Archive SSW 2010
ISCA Archive SSW 2010

Evaluating prosody in synthetic speech with online (eye-tracking) and offline (rating) methods

Rajakrishnan Rajkumar, Michael White, Shari R. Speer, Kiwako Ito

This study examines the relationship between online processing effects observed in earlier eye-tracking experiments [1, 2] and offline quality ratings gathered for the synthetic and natural speech stimuli used in these experiments, along with their acoustic-prosodic properties. White et al. [2] reported that even high-quality synthetic speech failed to replicate the facilitative effect of contextually appropriate accent patterns found with human speech, while it produced a more robust intonational garden-path effect with contextually inappropriate patterns. They conjectured that both of these effects could be due to processing delays observed with the synthetic speech. In this paper, we present an acoustic analysis of the stimuli used in the eye-tracking experiments and an offline stimuli rating task, which was designed to investigate whether a context-independent measure of utterance quality could predict processing-based effects. The analysis reveals that for synthetic speech, longer adjectives—which provide more processing time—do facilitate anticipatory looks to the target. Larger values of F0 drop (difference between the F0 values of the adjective and following noun) also negatively influenced looks to the target and were negatively correlated with offline ratings, suggesting that this may be a specific acoustic factor that merits attention in future work on improving synthesis quality. Finally, the study shows that online measures of unconscious processing and offline measures of conscious judgments, taken together, can provide a more comprehensive evaluation of synthetic speech than either method alone.

s K. Ito and S. R. Speer, “Semantically-independent but contextually-dependent interpretation of contrastive accent,” in Prosodic categories: production, perception and comprehension, P. Prieto, S. Frota, and G. Elordieta, Eds. Springer, to appear M. White, R. Rajkumar, K. Ito, and S. R. Speer, “Eye tracking for the online evaluation of prosody in speech synthesis: Not so fast!” in Proc. of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH-09), 2009

Index Terms: speech synthesis, evaluation, prosody, eye tracking, unit selection

Cite as: Rajkumar, R., White, M., Speer, S.R., Ito, K. (2010) Evaluating prosody in synthetic speech with online (eye-tracking) and offline (rating) methods. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 276-281

  author={Rajakrishnan Rajkumar and Michael White and Shari R. Speer and Kiwako Ito},
  title={{Evaluating prosody in synthetic speech with online (eye-tracking) and offline (rating) methods}},
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},