The authors present two visual articulation models for speech synthesis and methods to obtain them from measured data. The visual articulation models are used to control visible articulator movements described by six motion parameters: one for the up-down movement of the lower jaw, three for the lips and two for the tongue (see section 2.1 for details). To obtain the data, a female speaker was measured with the 2Darticulograph AG100 [1] and simultaneously filmed. The first visual articulation model is a hybrid data and rule based model that selects and combines most similar viseme patterns (section 2.3.). It is retrieved more or less directly from the measurements. The second model (section 2.4.) is rule based, following the dominance principal suggested by Löfqvist [2][3]. The parameter values for the second model are derived from the first one. Both models are integrated into MASSY, the Modular Audiovisual Speech SYnthesizer [4].
s AG 100 - Electromagnetic Articulography, http://www.articulograph.de. A. Löfqvist, Speech as Audible Gestures, in W. J. Hardcastle, A. Marchal (eds.), Speech Production and Speech Modeling, Dodrecht: Kluwer Academic Publishers, 1990. M. M. Cohen, D. W. Massaro, Modeling Co-articulation in Synthetic Visual Speech, in N. Magnenat Thalmann, D. Thalmann (eds.), Models and Techniques in Computer Animation, pp. 139-156, Tokyo: Springer- Verlag, 1993. S. Fagel, "MASSY - a Prototypic Implementation of the Modular Audiovisual Speech SYnthesizer", Proceedings of the 15th International Conference on Phonetic Science (to appear), Barcelona, 2003.
Cite as: Fagel, S., Clemens, C. (2003) Two articulation models for audiovisual speech synthesis - description and determination. Proc. Auditory-Visual Speech Processing, 215-220
@inproceedings{fagel03_avsp, author={Sascha Fagel and Caroline Clemens}, title={{Two articulation models for audiovisual speech synthesis - description and determination}}, year=2003, booktitle={Proc. Auditory-Visual Speech Processing}, pages={215--220} }