15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Partial Representations Improve the Prosody of Incremental Speech Synthesis

Timo Baumann

Universität Hamburg, Germany

When humans speak, they do not plan their full utterance in all detail before beginning to speak, nor do they speak piece-by-piece and ignoring their full message — instead humans use partial representations in which they fill in the missing parts as the utterance unfolds. Incremental speech synthesizers, in contrast, have not yet made use of partial representations and the information contained there-in.
   We analyze the quality of prosodic parameter assignments (pitch and duration) generated from partial utterance specifications (substituting defaults for missing features) in order to determine the requirements that symbolic incremental prosody modelling should meet. We find that broader, higher-level information helps to improve prosody even if lower-level information about the near future is yet unavailable. Furthermore, we find that symbolic phrase-level or utterance-level information is most helpful towards the end of the phrase or utterance, respectively, that is, when this information is becoming available even in the incremental case. Thus, the negative impact of incremental processing can be minimized by using partial representations that are filled in incrementally.

Full Paper

Bibliographic reference.  Baumann, Timo (2014): "Partial representations improve the prosody of incremental speech synthesis", In INTERSPEECH-2014, 2932-2936.