FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and
Auditory-Visual Speech Processing

Vienna, Austria
September 11-13, 2015

Improved Visual Speech Synthesis Using Dynamic Viseme k-means Clustering and Decision Trees

Christiaan Rademan, Thomas Niesler

Department of Electrical and Electronic Engineering, University of Stellenbosch, South Africa

We present a decision tree-based viseme clustering technique that allows visual speech synthesis after training on a small dataset of phonetically-annotated audiovisual speech. The decision trees allow improved viseme grouping by incorporating k-means clustering into the training algorithm. The use of overlapping dynamic visemes, defined by tri-phone time-varying oral pose boundaries, allows improved modelling of coarticulation effects. We show that our approach leads to a clear improvement over a comparable baseline in perceptual tests.
   The avatar is based on the freely available MakeHuman and Blender software components. Index Terms: conversational agent, talking head, visual speech synthesis, lip animation, coarticulation modelling, CART-based viseme clustering, audio-visual speech data corpus.

Full Paper

Bibliographic reference.  Rademan, Christiaan / Niesler, Thomas (2015): "Improved visual speech synthesis using dynamic viseme k-means clustering and decision trees", In FAAVSP-2015, 169-174.