Monosyllables have been widely accepted as the basic units for concatenative speech synthesis of Chinese dialects. However, concatenating individual syllables is not adequate to produce highly natural synthetic speech because of the improper coupling at syllable boundaries. This paper describes a preliminary research of using cross-syllable units for Cantonese speech synthesis. The acoustic inventory contains 1725 cross-syllable units, which are excised from properly selected and recorded carrier words. TD-PSOLA is employed for prosodic modification of synthetic speech. The results of subjective listening tests reveal that the proposed use of cross-syllable units has potential in producing highly natural synthetic speech, although the currently achieved performance is only fair. Substantial improvement is anticipated with better smoothing technique for waveform concatenation and greater coverage of context-dependent variation of the acoustic units.
Cite as: Law, K.M., Lee, T. (2000) Using cross-syllable units for Cantonese speech synthesis. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 407-410, doi: 10.21437/ICSLP.2000-294
@inproceedings{law00_icslp, author={Ka Man Law and Tan Lee}, title={{Using cross-syllable units for Cantonese speech synthesis}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 2, 407-410}, doi={10.21437/ICSLP.2000-294} }