ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Accent label prediction by time delay neural networks using gating clusters

Achim F. Müller, Rüdiger Hoffmann

In this paper a new neural network (NN) architecture for data driven prediction of accent labels---perceptual accents and pitch accents---for speech synthesis is presented. Within the proposed NN architecture, gating clusters are applied in a time delay (TD) framework. The gating clusters are used to adapt the network structure dynamically such that only available input feature vectors from the actual context window are treated. The proposed NN architecture has been successfully applied for accent label prediction on word level within our text-to-speech (TTS) system. Prediction accuracy for our German corpus was 86.1%. On an english corpus the achieved accuracy was 84.5%. This result is superior to results achieved on the same corpus with an approach based on classification and regression tree (CART) techniques[1]. The results were achieved with a simpler feature set than that used in[1]. [1] K. Ross and M. Ostendorf, "Prediction of abstract prosodic labels for speech synthesis"


doi: 10.21437/Eurospeech.2001-147

Cite as: Müller, A.F., Hoffmann, R. (2001) Accent label prediction by time delay neural networks using gating clusters. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 549-553, doi: 10.21437/Eurospeech.2001-147

@inproceedings{muller01_eurospeech,
  author={Achim F. Müller and Rüdiger Hoffmann},
  title={{Accent label prediction by time delay neural networks using gating clusters}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={549--553},
  doi={10.21437/Eurospeech.2001-147}
}