AVSP 2003 - International Conference on Audio-Visual Speech Processing

September 4-7, 2003
St. Jorioz, France

Two Articulation Models for Audiovisual Speech Synthesis - Description and Determination

Sascha Fagel, Caroline Clemens

Department of Communication Science, Technical University Berlin, Germany

The authors present two visual articulation models for speech synthesis and methods to obtain them from measured data. The visual articulation models are used to control visible articulator movements described by six motion parameters: one for the up-down movement of the lower jaw, three for the lips and two for the tongue (see section 2.1 for details). To obtain the data, a female speaker was measured with the 2Darticulograph AG100 [1] and simultaneously filmed. The first visual articulation model is a hybrid data and rule based model that selects and combines most similar viseme patterns (section 2.3.). It is retrieved more or less directly from the measurements. The second model (section 2.4.) is rule based, following the dominance principal suggested by Löfqvist [2][3]. The parameter values for the second model are derived from the first one. Both models are integrated into MASSY, the Modular Audiovisual Speech SYnthesizer [4].

References

  1. AG 100 - Electromagnetic Articulography, http://www.articulograph.de.
  2. A. Löfqvist, “Speech as Audible Gestures”, in W. J. Hardcastle, A. Marchal (eds.), Speech Production and Speech Modeling, Dodrecht: Kluwer Academic Publishers, 1990.
  3. M. M. Cohen, D. W. Massaro, “Modeling Co-articulation in Synthetic Visual Speech”, in N. Magnenat Thalmann, D. Thalmann (eds.), Models and Techniques in Computer Animation, pp. 139-156, Tokyo: Springer- Verlag, 1993.
  4. S. Fagel, "MASSY - a Prototypic Implementation of the Modular Audiovisual Speech SYnthesizer", Proceedings of the 15th International Conference on Phonetic Science (to appear), Barcelona, 2003.


Full Paper

Bibliographic reference.  Fagel, Sascha / Clemens, Caroline (2003): "Two articulation models for audiovisual speech synthesis - description and determination", In AVSP 2003, 215-220.