When humans speak, they do not plan their full utterance in all detail before
beginning to speak, nor do they speak piece-by-piece and ignoring their full
message instead humans use partial representations in which they fill
in the missing parts as the utterance unfolds. Incremental speech synthesizers,
in contrast, have not yet made use of partial representations and the information
We analyze the quality of prosodic parameter assignments (pitch and duration) generated from partial utterance specifications (substituting defaults for missing features) in order to determine the requirements that symbolic incremental prosody modelling should meet. We find that broader, higher-level information helps to improve prosody even if lower-level information about the near future is yet unavailable. Furthermore, we find that symbolic phrase-level or utterance-level information is most helpful towards the end of the phrase or utterance, respectively, that is, when this information is becoming available even in the incremental case. Thus, the negative impact of incremental processing can be minimized by using partial representations that are filled in incrementally.
Bibliographic reference. Baumann, Timo (2014): "Partial representations improve the prosody of incremental speech synthesis", In INTERSPEECH-2014, 2932-2936.