In order to render a high quality, versatile 3D talking head, a stable, high frame rate AV data acquisition system is constructed. It can capture 3D position, surface orientation and albedo texture of the talking head video images along with the corresponding speech signals. The system consists of a computer controlled LED lighting subsystem; high speed stereo cameras; a microphone; and a computer for synchronous recording of multi-stream AV data. The visual image data collected is processed through a binocular photometric stereo 3D reconstruction pipeline. The pipeline automatically segments out the face; computes the depth map with binocular stereo; computes the normal map with photometric stereo; generates albedo texture; and finally constructs a high-detailed 3d model with depth and normal cues as constraints. By using the data collected with the built system, we can capture high quality dynamic facial performance, synchronized with the subject's uttered speech.
Bibliographic reference. Wang, Chaoyang / Wang, Lijuan / Matsushita, Yasuyuki / Huang, Bojun / Chen, Magnetro / Soong, Frank K. (2013): "Binocular photometric stereo acquisition and reconstruction for 3d talking head applications", In INTERSPEECH-2013, 2748-2752.