7th International Conference on Spoken Language Processing
September 16-20, 2002
In order to estimate the degradation of naturalness in concatenative speech synthesis due to the mismatch of syllables between unit selection and its use, perceptual experiments were conducted using speech stimuli synthesized by concatenating the preceding Final (vowel) of the first syllable and the succeeding Initial (consonant) of the second syllable, and by combining the tone positions. The results for substitution of the succeeding final of one syllable and the preceding initial of the next syllable showed that naturalness was low when the speech segmentation was difficult, such as [s, m, y, w]. For the substitution of the tone, the results showed that the naturalness was (1) high for a combination of high tone (1st) and rising tone (2nd), (2) low for a combination of low tone (3rd) and rising tone (2nd). From these results, the searching time can be reduce by 43% for tone selection, with the same effect on the selection of the succeeding final of one syllable and the preceding initial of the next syllable.
Bibliographic reference. Lu, Jinlin / Kawai, Hisashi (2002): "Perceptual evaluation of naturalness due to substitution of Chinese syllable for concatenative speech synthesis", In ICSLP-2002, 2377-2380.