Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Using Cross-Syllable Units for Cantonese Speech Synthesis

Ka Man Law, Tan Lee

Department of Electronic Engineering, The Chinese University of Hong Kong

Monosyllables have been widely accepted as the basic units for concatenative speech synthesis of Chinese dialects. However, concatenating individual syllables is not adequate to produce highly natural synthetic speech because of the improper coupling at syllable boundaries. This paper describes a preliminary research of using cross-syllable units for Cantonese speech synthesis. The acoustic inventory contains 1725 cross-syllable units, which are excised from properly selected and recorded carrier words. TD-PSOLA is employed for prosodic modification of synthetic speech. The results of subjective listening tests reveal that the proposed use of cross-syllable units has potential in producing highly natural synthetic speech, although the currently achieved performance is only fair. Substantial improvement is anticipated with better smoothing technique for waveform concatenation and greater coverage of context-dependent variation of the acoustic units.


Full Paper

Bibliographic reference.  Law, Ka Man / Lee, Tan (2000): "Using cross-syllable units for Cantonese speech synthesis", In ICSLP-2000, vol.2, 407-410.