ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Data-driven synthesis of expressive visual speech using an MPEG-4 talking head

Jonas Beskow, Mikael Nordenberg

This paper describes initial experiments with synthesis of visual speech articulation for different emotions, using a newly developed MPEG-4 compatible talking head. The basic problem with combining speech and emotion in a talking head is to handle the interaction between emotional expression and articulation in the orofacial region. Rather than trying to model speech and emotion as two separate properties, the strategy taken here is to incorporate emotional expression in the articulation from the beginning. We use a data-driven approach, training the system to recreate the expressive articulation produced by an actor while portraying different emotions. Each emotion is modelled separately using principal component analysis and a parametric coarticulation model. The results so far are encouraging but more work is needed to improve naturalness and accuracy of the synthesized speech.


doi: 10.21437/Interspeech.2005-376

Cite as: Beskow, J., Nordenberg, M. (2005) Data-driven synthesis of expressive visual speech using an MPEG-4 talking head. Proc. Interspeech 2005, 793-796, doi: 10.21437/Interspeech.2005-376

@inproceedings{beskow05_interspeech,
  author={Jonas Beskow and Mikael Nordenberg},
  title={{Data-driven synthesis of expressive visual speech using an MPEG-4 talking head}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={793--796},
  doi={10.21437/Interspeech.2005-376}
}