Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

Real-Time Control of Expressive Speech Synthesis Using Kinect Body Tracking

Christophe Veaux (1), Maria Astrinaki (2), Keiichiro Oura (3), Robert A. J. Clark (1), Junichi Yamagishi (1)

(1) University of Edinburgh, UK
(2) TCTS Lab., Numediart Institute, University of Mons, Belgium
(3) Departament of Computer Science, Nagoya Institute of Technology, Japan

The flexibility of statistical parametric speech synthesis has recently led to the development of interactive speech synthesis systems where different aspects of the voice output can be continuously controlled. The demonstration presented in this paper is based on MAGE/pHTS, a real-time synthesis system developed at Mons University. This system enhances the controllability and the reactivity of HTS by enabling the generation of the speech parameters on the fly. This demonstration gives an illustration of the new possibilities offered by this approach in terms of interaction. A kinect sensor is used to follow the gestures and body posture of the user and these physical parameters are mapped to the prosodic parameters of an HMM-based singing voice model. In this way, the user can directly control various aspect of the singing voice such as the vibrato, the fundamental frequency or the duration. An avatar is used to encourage and facilitate the user interaction. Index Terms: Performative Speech Synthesis, Mage, Singing Voice Synthesis

Full Paper

Bibliographic reference.  Veaux, Christophe / Astrinaki, Maria / Oura, Keiichiro / Clark, Robert A. J. / Yamagishi, Junichi (2013): "Real-time control of expressive speech synthesis using kinect body tracking", In SSW8, 247-248.