7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Distributed Audio-Visual Speech Synchronization

Peter Poller, Jochen Müller

DFKI, Germany

The main scientific goal of the SmartKom project is to develop a new human-machine interaction metaphor for multimodal dialog systems. It combines speech, gesture, and facial expression input with speech, gesture and graphics output. The system is realized as a distributed collection of communicating and cooperating autonomous modules based on a multi-blackboard architecture. Multimodal output generation is consequently separated in two steps. First, the modality-specific output data are generated. Second, an inter-media synchronization of these data is realized on independent media devices to perform the multimodal presentation to the user. This paper describes the generation of appropriate lip animations that are based on a phonetic representation of the speech output signal and as a second computational step the timestamp based realization of audio-visual speech output on distributed media devices.


Full Paper

Bibliographic reference.  Poller, Peter / Müller, Jochen (2002): "Distributed audio-visual speech synchronization", In ICSLP-2002, 205-208.