In this paper we present a data-driven 3D talking head system using facial video and a X-ray film database for speech research. In order to construct a database recording the three dimensional positions of articulators at phoneme-level, the feature points of articulators were defined and labeled in facial and X-ray images for each English phoneme. Dynamic displacement based deformations were used in three modes to simulate the motions of both external and internal articulators. For continuous speech, the articulatory movements of each phoneme within an utterance were concatenated. A blending function was also employed to smooth the concatenation. In audio-visual test, a set of minimal pairs were used as the stimuli to access the realistic degree of articulatory motions of the 3D talking head. In the experiments where the subjects are native speakers and professional English teachers, a word identification accuracy of 91.1% among 156 tests was obtained.
Bibliographic reference. Wang, Lan / Chen, Hui / Ouyang, JianJun (2009): "Evaluation of external and internal articulator dynamics for pronunciation learning", In INTERSPEECH-2009, 2247-2250.