10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Evaluation of External and Internal Articulator Dynamics for Pronunciation Learning

Lan Wang, Hui Chen, JianJun Ouyang

Chinese Academy of Sciences, China

In this paper we present a data-driven 3D talking head system using facial video and a X-ray film database for speech research. In order to construct a database recording the three dimensional positions of articulators at phoneme-level, the feature points of articulators were defined and labeled in facial and X-ray images for each English phoneme. Dynamic displacement based deformations were used in three modes to simulate the motions of both external and internal articulators. For continuous speech, the articulatory movements of each phoneme within an utterance were concatenated. A blending function was also employed to smooth the concatenation. In audio-visual test, a set of minimal pairs were used as the stimuli to access the realistic degree of articulatory motions of the 3D talking head. In the experiments where the subjects are native speakers and professional English teachers, a word identification accuracy of 91.1% among 156 tests was obtained.

Full Paper

Bibliographic reference.  Wang, Lan / Chen, Hui / Ouyang, JianJun (2009): "Evaluation of external and internal articulator dynamics for pronunciation learning", In INTERSPEECH-2009, 2247-2250.