The speech conductor: gestural control of speech synthesis

Christophe d'Alessandro, Nicolas D'Alessandro, Sylvain Le Beux, Juraj Simko, Feride Çeti, Hannes Pirker

The Speech Conductor project aimed at developing a gesture interface for driving ("conducting") a speech synthesis system. Four real-time gesture controlled synthesis systems have been developed. For the first two systems, the efforts focused on high quality voice source synthesis. These "Baby Synthesizers" are based on formant synthesis and they include refined voice source components. One of them is based on an augmented LF model (including an aperiodic component), the other one is based on a Causal/Anticausal Linear Model of the voice source (CALM) also augmented with an aperiodic component. The two other systems are able to utter unrestricted speech. They are based on the MaxMBROLA and MidiMBROLA applications. All these systems are controlled by various gesture devices. Informal testing and public demonstrations showed that very natural and expressive synthetic voices can be produced in real time by some combination of input devices/synthesis system.

Cite as: d'Alessandro, C., D'Alessandro, N., Beux, S.L., Simko, J., Çeti, F., Pirker, H. (2005) The speech conductor: gestural control of speech synthesis. Proc. Summer Workshop on Multimodal Interfaces (eINTERFACE 2005), 52-61

