![]() |
Auditory-Visual Speech Processing
|
![]() |
We present a linear three-dimensional modeling paradigm for lips and face, that captures the audiovisual speech activity of a given speaker by only six parameters. Our articulatory models are constructed from real data (front and profile images), using a linear component analysis of about 200 3D coordinates of fleshpoints on the subject's face and lips. Compared to a raw component analysis, our construction approach leads to somewhat more comparable relations across subjects: by construction, the six parameters have a clear phonetic/articulatory interpretation. We use such a speaker's specific articulatory model to regularize MPEG-4 facial articulation parameters (FAP) and show that this regularization process can drastically reduce bandwidth, noise and quantization artifacts. We then present how analysis-by-synthesis techniques using the speaker-specific model allows the tracking of facial movements. Finally, the results of this tracking scheme have been used to develop a text-to-audiovisual speech system.
Bibliographic reference. Elisei, F. / Odisio, M. / Bailly, Gérard / Badin, Pierre (2001): "Creating and controlling video-realistic talking heads", In AVSP-2001, 90-97.
Link | Original Filename | Description | Format |
av01_090_01.avi (1183 KB) | NomoJ1.avi | First jaw-driven articulator (height) | Video File - AVI |
av01_090_02.avi (1182 KB) | NomoL1.avi | First lips-driven articulator (width/protrusion) | Video File - AVI |
av01_090_03.avi (1185 KB) | NomoL2.avi | Second lips-driven articulator (lower lip) | Video File - AVI |
av01_090_04.avi (1185 KB) | NomoL3.avi | Second lips-driven articulator (lower lip) | Video File - AVI |
av01_090_05.avi (1184 KB) | NomoJ2.avi | Second jaw articulator (advance) | Video File - AVI |
av01_090_06.avi (1182 KB) | NomoL1.avi | Residual articulator (larynx skin) | Video File - AVI |
av01_090_07.avi (1775 KB) | fap_capuchon.avi | Playing the same FAP stream on 2 different clones | Video File - AVI |
av01_090_08.avi (1052 KB) | aga_half.avi | Side by side : analysis-by-synthesis inversion, half superimposed on the tracked video (learning conditions) | Video File - AVI |
av01_090_09.avi (1984 KB) | salam_vid.avi | Example of reconstruction/tracking in learning conditions | Video File - AVI |
av01_090_10.avi (21566 KB) | bise.avi | Reconstruction of a long sequence (tracked in learning conditions), with recovered jaw movements. | Video File - AVI |
av01_090_11.avi (1925 KB) | capuchon.avi | Tracking in natural conditions: superimposing the resulting articulations through the wire-frame model | Video File - AVI |
av01_090_12.avi (3493 KB) | jaw_recover.avi | Showing the recovered jaw movements (learning conditions) | Video File - AVI |
av01_090_13.avi (1456 KB) | jaw_recover_02.avi | Showing the recovered jaw and 3D movements from a single front-only view (natural conditions) | Video File - AVI |
av01_090_14.avi (20502 KB) | tts_icp_fr.avi | Output of our text to audio-visual speech system | Video File - AVI |
av01_090_15.avi (1703 KB) | 3D_tongue.avi | The ICP 3D tongue (linked with the jaw parameters) | Video File - AVI |