Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Improvement of a Physiological Articulatory Model for Synthesis of Vowel Sequences

Jianwu Dang, Kiyoshi Honda

ATRHuman Information Processing Research Labs, Kyoto, Japan

A 3D physiological articulatory model has been constructed based on volumetric MRI data obtained from a male speaker. The model is driven by muscles according to a target-dependent activation pattern. In this study, we improved dynamic characteristics of the model to produce higher sound quality for vowel sequences. Dynamic characteristics of articulatory organs were investigated using X-ray microbeam data for vowel sequences and vowel-consonant-vowel (VCV) sequences for 11 Japanese speakers. It was found that the velocity of the tongue tip is about 60% faster in transition of vowel-to-consonant than that of vowel-to-vowel, while the velocities of the tongue dorsum and jaw were independent of the sequences. Reaction time, from maximal acceleration to maximal velocity, of the articulators is about 40% shorter in vowel-to-consonant transitions than in vowel-to-vowel transitions. To apply the improved model for speech analysis, articulatory targets were estimated for the vowels in vowel sequences using AbS method, and used to generate the vocal tract shapes for vowel sequences. The vocal tract shapes and synthetic sounds were compared with speech sound and articulatory data from the target speaker. The results showed that our model demonstrates plausible dynamic characteristics of articulatory movement in producing vowel sequences. The simulation error was about 2.5% for the formants, and 0.2 cm for the observation points of the vocal tract.

Full Paper

Bibliographic reference.  Dang, Jianwu / Honda, Kiyoshi (2000): "Improvement of a physiological articulatory model for synthesis of vowel sequences", In ICSLP-2000, vol.1, 457-460.