In this paper we apply speaker-adaptive and speaker-dependent training of hidden Markov models (HMMs) to visual speech synthesis. In speaker-dependent training we use data from one speaker to train a visual and acoustic HMM. In speaker-adaptive training, first a visual background model (average voice) from multiple speakers is trained. This background model is then adapted to a new target speaker using (a small amount of) data from the target speaker. This concept has been successfully applied to acoustic speech synthesis. This paper demonstrates how model adaption is applied to the visual domain to synthesize animations of talking faces. A perceptive evaluation is performed, showing that speaker-adaptive modeling outperforms speaker-dependent models for small amounts of training / adaptation data.
Index Terms: Visual speech synthesis, speaker-adaptive training, facial animation
Bibliographic reference. Schabus, Dietmar / Pucher, Michael / Hofer, Gregor (2012): "Speaker-adaptive visual speech synthesis in the HMM-framework", In INTERSPEECH-2012, 979-982.