8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Quantitative Analysis and Synthesis of Syllabic Tones in Vietnamese

Hansjorg Mixdorff (1), Nguyen Hung Bach (2), Hiroya Fujisaki (3), Mai Chi Luong (2)

(1) Berlin University of Applied Sciences, Germany
(2) National Centre for Science and Technology, Vietnam
(3) University of Tokyo, Japan

The current paper presents a preliminary study on the production and perception of syllabic tones of Vietnamese. A speech corpus consisting of fifty-two six-syllable sequences with various combinations of tones was uttered by two speakers of Standard Vietnamese, one male and one female. The corpus was labeled on the syllabic level and analyzed using the Fujisaki model. Results show that the six tone types basically fall into two categories: Level, rising, curve and falling tone can be accurately modeled by using tone commands of positive or negative polarity. The so-called drop and broken tones, however, obviously require a special control causing creaky voice and in cases a very fast drop in F0 leading to temporary F0 halving or even quartering. In contrast to the drop tone, the broken tone exhibits an F0 rise and hence a positive tone command right after the creak occurs. Further observations suggest that drop and broken tone do not only differ from the other four tones with respect to their F0 characteristics, but also as to their much tenser articulation. A perception experiment performed with natural and resynthesized stimuli shows, inter alia, that tone 4 is most prone to confusion and that tone 6 obviously requires tense articulation as well as vocal fry to be identified reliably.

Bibliographic reference.  Mixdorff, Hansjorg / Bach, Nguyen Hung / Fujisaki, Hiroya / Luong, Mai Chi (2003): "Quantitative analysis and synthesis of syllabic tones in vietnamese", In EUROSPEECH-2003, 177-180.