![]() |
Auditory-Visual Speech Processing 2005British Columbia, Canada |
![]() |
We describe our progress on the construction of a combined 3D face and vocal tract simulator for articulatory speech synthesis called ArtiSynth. The architecture provides six main modules: (1) a simulator engine and synthesis framework, (2) a two and three-dimensional model development component, (3) a numerics engine, (4) a graphical renderer, (5) an audio synthesis engine and (6) a graphical user interface (GUI). We have created infrastructure for creating vocal tract models based on combinations of rigid body, spring-mass, and finite element models, and some parametric models. Our infrastructure provides mechanisms to ``glue'' these and other model types together to create hybrids. Dynamical models whose equations of motion are integrated numerically and animatable parametric models are combined in a single framework. Using ArtiSynth we have created a complex, dynamic jaw model based on muscle models, a parametric tongue model, a face model, two lip models, and a source-filter based acoustic model linked to the vocal tract model via an airway model. These have been connected together to form a complete vocal tract that produces speech and is drivable both by data and by dynamics.
Bibliographic reference. Fels, Sidney / Vogt, Florian / Doel, Kees van den / Lloyd, John E. / Guenther, Oliver (2005): "Artisynth: an extensible, cross-platform 3d articulatory speech synthesizer", In AVSP-2005, 119-124.