This paper describes our recent investigation on the use of both intra-syllable and cross-syllable acoustic units for Cantonese text-to-speech synthesis. In our previous work, isolated monosyllable units were used for concatenative speech synthesis of Cantonese. The synthetic speech was considered to be unnatural in such a way that there was an obvious lack of perceptual continuity. The proposed system adopts an acoustic inventory that covers all legitimate intra-syllable and cross-syllable acoustic units. Synthetic speech produced via concatenation of such sub-syllable units better captures the pertinent transitory effects that are crucial to perceived naturalness. Different strategies are used to concatenate speech segments with different acoustic-phonetic properties. Subjective listening test shows a noticeable performance improvement that is accounted for mainly by smoother transition between sonorant segments.
Cite as: Law, K.M., Lee, T., Lau, W. (2001) Cantonese text-to-speech synthesis using sub-syllable units. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 991-994, doi: 10.21437/Eurospeech.2001-263
@inproceedings{law01_eurospeech, author={K. M. Law and Tan Lee and Wai Lau}, title={{Cantonese text-to-speech synthesis using sub-syllable units}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={991--994}, doi={10.21437/Eurospeech.2001-263} }