We describe an unsupervised probabilistic approach for synthesising visual speech from audio. Acoustic features representing a training corpus are clustered and the probability density function (PDF) of each cluster is modelled as a Gaussian mixture model (GMM). A visual target in the form of a short-term parameter trajectory is generated for each cluster. Synthesis involves combining the cluster targets based on the likelihood of novel acoustic feature vectors, then cross-blending neighbouring regions of the synthesised short-term trajectories. The advantage of our approach is coarticulation effects are explicitly captured by the mapping. The influence of cluster targets naturally increase and decrease with the likelihood of the acoustic feature vectors.
Bibliographic reference. Theobald, Barry-John / Wilkinson, Nicholas (2008): "A probabilistic trajectory synthesis system for synthesising visual speech", In INTERSPEECH-2008, 1857-1860.