ISCA Archive SSW 2004
ISCA Archive SSW 2004

A corpus-based approach to expressive speech synthesis

E. Eide, A. Aaron, R. Bakis, W. Hamza, Michael Picheny, J. Pitrelli

Human speech communication can be thought of as comprising two channels - the words themselves, and the style in which they are spoken. Each of these channels carries information. Today's most-advanced text-to-speech (TTS) systems such as [1],[2],[3],[4] fall far short of human speech because they offer only a single, fixed style of delivery, independent of the message. In this paper, we describe the IBM Expressive TTS Engine, which is able to add another channel by offering five speaking styles. These are: neutral declarative, conveying good news, conveying bad news, asking a question, and showing contrastive emphasis. In addition to generating speech in these five styles, our TTS system is also able to generate paralinguistic events such as sighs, breaths, and filled pauses which further enrich the style channel. We describe our methods for generating and evaluating expressive synthetic speech and paralinguistic effects. We show significant perceptual differences between expressive and neutral synthetic speech for each of our speaking styles. In addition, we describe how users have been empowered to easily communicate the desired expression to the TTS engine through our extensions [5] of the Speech Synthesis Markup Language(SSML) [6].

Cite as: Eide, E., Aaron, A., Bakis, R., Hamza, W., Picheny, M., Pitrelli, J. (2004) A corpus-based approach to expressive speech synthesis. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 79-84

  author={E. Eide and A. Aaron and R. Bakis and W. Hamza and Michael Picheny and J. Pitrelli},
  title={{A corpus-based approach to  expressive speech synthesis}},
  booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)},