In this paper a system for the automatic lip synchronization of virtual 3D human based only on the speech input is described. The speech signal is classified into viseme classes using neural networks. Visual representation of phonemes, visemes, defined in MPEG-4 FA, is used for face synthesis.
Bibliographic reference. Zoric, Goranka / Cerekovic, Aleksandra / Pandzic, Igor S. (2008): "Automatic lip synchronization by speech signal analysis", In INTERSPEECH-2008, 2323.