In this study, a system, that generates visual speech by synthesizing 3D face points, has been implemented. The synthesized face points drive MPEG-4 facial animation. To produce realistic and natural speech animation, a codebook based technique, which is trained with audio-visual data from a speaker, was employed. An audio-visual speech database was created using a 3D facial motion capture system that was developed for this study. To improve the performance of the system when used by different speakers, a further training was performed with audio-only data from a small number of speakers. The resulting system is capable of animating faces from an input speech of any Turkish speaker.
Cite as: Savran, A., Arslan, L.M., Akarun, L. (2004) Speech driven MPEG-4 facial animation for Turkish. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 57-64
@inproceedings{savran04_specom, author={Arman Savran and Levent M. Arslan and Lale Akarun}, title={{Speech driven MPEG-4 facial animation for Turkish}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={57--64} }