The effect of tonal context on cantonese concatenative speech synthesis

Tien-Ying Fung, Helen Meng

This paper describes our study of the effect of tonal context on Cantonese concatenative speech synthesis. We have previously developed a speech synthesizer, CU VOCAL, that concatenates syllables to generate Cantonese and Mandarin speech [1, 2]. The preliminary version of CU VOCAL captures only the place of articulation as coarticulatory context by the use of distinctive features in unit selection [3]. However, we noticed discrepancies between the perceived tone and the desired tone for some Cantonese syllables in the synthesized speech, which affected the perceived quality of the synthesis outputs. This suggests the need to extend our unit selection strategy to incorporate tonal context as well. In order to devise such a strategy, we studied the comparative importance between the left and right tonal contexts in terms of their influence on the perceived tone of the current syllable. We also defined a scheme by which we can measure the difference between a desired syllable token and its tonal variant, in terms of attributes such as tone shape, tone height and tone trajectory. Hence, if a desired syllable token is unavailable during concatenative synthesis, we can substitute with its "closest" tonal variant as suggested by our unit selection scheme.


Cite as: Fung, T.-Y., Meng, H. (2002) The effect of tonal context on cantonese concatenative speech synthesis. Proc. International Symposium on Chinese Spoken Language Processing, paper 66

  author={Tien-Ying Fung and Helen Meng},
  title={{The effect of tonal context on cantonese concatenative speech synthesis}},
  booktitle={Proc. International Symposium on Chinese Spoken Language Processing},
  pages={paper 66}