ISCA Archive AVSP 2013
ISCA Archive AVSP 2013

Visual control of hidden-semi-Markov-model based acoustic speech synthesis

Jakob Hollenstein, ichael Pucher, Dietmar Schabus

We show how to visually control acoustic speech synthesis by modelling the dependency between visual and acoustic parameters within the Hidden-Semi-Markov-Model (HSMM) based speech synthesis framework. A joint audio-visual model is trained with 3D facial marker trajectories as visual features. Since the dependencies of acoustic features on visual features are only present for certain phones, we implemented a model where dependencies are estimated for a set of vowels only. A subjective evaluation consisting of a vowel identification task showed that we can transform some vowel trajectories in a phonetically meaningful way by controlling the visual parameters in PCA space. These visual parameters can also be interpreted as fundamental visual speech motion components, which leads to an intuitive control model.

Index Terms: audio-visual speech synthesis, HMM-based speech synthesis, controllability


Cite as: Hollenstein, J., Pucher, i., Schabus, D. (2013) Visual control of hidden-semi-Markov-model based acoustic speech synthesis. Proc. Auditory-Visual Speech Processing, 31-36

@inproceedings{hollenstein13_avsp,
  author={Jakob Hollenstein and ichael Pucher and Dietmar Schabus},
  title={{Visual control of hidden-semi-Markov-model based acoustic speech synthesis}},
  year=2013,
  booktitle={Proc. Auditory-Visual Speech Processing},
  pages={31--36}
}