ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A hybrid statistical/RNN approach to prosody synthesis for taiwanese TTS

Sin-Horng Chen, Chen-Chung Ho

In this paper a hybrid approach which incorporates statistical modeling of prosodic parameters into recurrent neural network (RNN)-based prosody synthesis for Min-Nan speech (Taiwanese) is proposed. It takes syllable as the basic synthesis unit and constructs statistical models for syllable-initial duration, syllable-final duration, inter-syllable pause duration, pitch contour of syllabIe, and log-energy level of syllabloe. In the training, it normalizes prosodic parameters by these statistical models and uses the results to train an RNN prosody synthesizer. In synthesis, it denormalizes the RNN outputs by the same statistical models to generate all prosodic parameters required by the TTS system. The advantage of the approach can be justified as to relieve the RNN prosody synthesizer of some affecting factors via taking care them by using the statistical models.


Cite as: Chen, S.-H., Ho, C.-C. (2000) A hybrid statistical/RNN approach to prosody synthesis for taiwanese TTS. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 613-616

@inproceedings{chen00c_icslp,
  author={Sin-Horng Chen and Chen-Chung Ho},
  title={{A hybrid statistical/RNN approach to prosody synthesis for taiwanese TTS}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 613-616}
}