ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Evaluation of external and internal articulator dynamics for pronunciation learning

Lan Wang, Hui Chen, JianJun Ouyang

In this paper we present a data-driven 3D talking head system using facial video and a X-ray film database for speech research. In order to construct a database recording the three dimensional positions of articulators at phoneme-level, the feature points of articulators were defined and labeled in facial and X-ray images for each English phoneme. Dynamic displacement based deformations were used in three modes to simulate the motions of both external and internal articulators. For continuous speech, the articulatory movements of each phoneme within an utterance were concatenated. A blending function was also employed to smooth the concatenation. In audio-visual test, a set of minimal pairs were used as the stimuli to access the realistic degree of articulatory motions of the 3D talking head. In the experiments where the subjects are native speakers and professional English teachers, a word identification accuracy of 91.1% among 156 tests was obtained.


doi: 10.21437/Interspeech.2009-638

Cite as: Wang, L., Chen, H., Ouyang, J. (2009) Evaluation of external and internal articulator dynamics for pronunciation learning. Proc. Interspeech 2009, 2247-2250, doi: 10.21437/Interspeech.2009-638

@inproceedings{wang09i_interspeech,
  author={Lan Wang and Hui Chen and JianJun Ouyang},
  title={{Evaluation of external and internal articulator dynamics for pronunciation learning}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2247--2250},
  doi={10.21437/Interspeech.2009-638}
}