Auditory-Visual Speech Processing (AVSP) 2013

Annecy, France
August 29 - September 1, 2013

Visual Control of Hidden-Semi-Markov-Model based Acoustic Speech Synthesis

Jakob Hollenstein (1,2),Michael Pucher (1), Dietmar Schabus (1,3)

(1) Telecommunications Research Center Vienna (FTW), Vienna, Austria
(2) Vienna University of Technology, Vienna, Austria
(3) Graz University of Technology, Graz, Austria

We show how to visually control acoustic speech synthesis by modelling the dependency between visual and acoustic parameters within the Hidden-Semi-Markov-Model (HSMM) based speech synthesis framework. A joint audio-visual model is trained with 3D facial marker trajectories as visual features. Since the dependencies of acoustic features on visual features are only present for certain phones, we implemented a model where dependencies are estimated for a set of vowels only. A subjective evaluation consisting of a vowel identification task showed that we can transform some vowel trajectories in a phonetically meaningful way by controlling the visual parameters in PCA space. These visual parameters can also be interpreted as fundamental visual speech motion components, which leads to an intuitive control model.

Index Terms: audio-visual speech synthesis, HMM-based speech synthesis, controllability

Full Paper

Bibliographic reference.  Hollenstein, Jakob / Pucher, ichael / Schabus, Dietmar (2013): "Visual control of hidden-semi-Markov-model based acoustic speech synthesis", In AVSP-2013, 31-36.