This paper describes a pioneer study on prosodic control for Cantonese text-to-speech synthesis. We attempt to establish a set of segment-level duration rules and context-dependent F0 profiles and apply them to a syllable-based concatenative speech synthesizer which uses TD-PSOLA as prosodic modification technique. The prosodic features are extracted by statistical characterization of a large amount of speech data. Subjective listening test shows that the micro-prosodic control results in a marginal but consistent improvement in perceptual naturalness.
Cite as: Lee, T., Meng, H.M., Lau, W.H., Lo, W.K., Ching, P.C. (1999) Micro-prosodic control in cantonese text-to-speech synthesis. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1855-1858, doi: 10.21437/Eurospeech.1999-405
@inproceedings{lee99_eurospeech, author={Tan Lee and Helen M. Meng and Wai H. Lau and W. K. Lo and P. C. Ching}, title={{Micro-prosodic control in cantonese text-to-speech synthesis}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={1855--1858}, doi={10.21437/Eurospeech.1999-405} }