ISCA Archive SpeechProsody 2002
ISCA Archive SpeechProsody 2002

Duration control by asymmetric causal retro-causal neural networks

Caglayan Erdem, Hans Georg Zimmermann

The generation of pleasant prosody parameters is very important for speech synthesis. A prosody generation unit can be seen as a dynamical system.

In this paper sophisticated time-delay recurrent neural network (NN) topologies are presented which can be used for the modeling of dynamical systems. Within the prosody prediction task left and right context information is known to influence the prediction of prosody control parameters. This can be modeled by causal retro-causal information flows. Since information being available during training is partially unavailable during application, there is a structural switching from training to application. This structural change of the information flow is handled by two asymmetric architectures.

These proposed new architectures allow the integration of further a priori knowledge. By this we are able to improve the performance of our duration control unit within our textto- speech (TTS) system Papageno.


Cite as: Erdem, C., Zimmermann, H.G. (2002) Duration control by asymmetric causal retro-causal neural networks. Proc. Speech Prosody 2002, 271-274

@inproceedings{erdem02b_speechprosody,
  author={Caglayan Erdem and Hans Georg Zimmermann},
  title={{Duration control by asymmetric causal retro-causal neural networks}},
  year=2002,
  booktitle={Proc. Speech Prosody 2002},
  pages={271--274}
}