FAAVSP - The 1st Joint Conference on
Facial Analysis, Animation, and
We present a decision tree-based viseme clustering technique
that allows visual speech synthesis after training on a small
dataset of phonetically-annotated audiovisual speech. The decision
trees allow improved viseme grouping by incorporating
k-means clustering into the training algorithm.
The use of overlapping dynamic visemes, defined by tri-phone
time-varying oral pose boundaries, allows improved modelling
of coarticulation effects. We show that our approach leads to
a clear improvement over a comparable baseline in perceptual
The avatar is based on the freely available MakeHuman and Blender software components. Index Terms: conversational agent, talking head, visual speech synthesis, lip animation, coarticulation modelling, CART-based viseme clustering, audio-visual speech data corpus.
Bibliographic reference. Rademan, Christiaan / Niesler, Thomas (2015): "Improved visual speech synthesis using dynamic viseme k-means clustering and decision trees", In FAAVSP-2015, 169-174.