Auditory-Visual Speech Processing (AVSP'99)
August 7-10, 1999
In this paper, we present Face Translation, a translation agent for people who speak different languages. The system can not only translate a spoken utterance into another language, but also produce an audio-visual output with the speaker's face and synchronized lip movement. The visual output is synthesized from real images based on image morphing technology. Both mouth and eye movements are generated according to linguistic and social cues. An automatic feature extracting system can automatically initialize the system. After initialization, the system can generate synchronized visual output based on a few pre-stored images. The system is useful for a video conference application with a limited bandwidth. We have demonstrated the system in a travel planning application where a foreign tourist plans a trip with a travel agent over the Internet in a multimedia collaborative working space using a multimodal interface.
Bibliographic reference. Ritter, Max / Meier, Uwe / Yang, Jie / Waibel, Alex (1999): "Face translation: A multimodal translation agent", In AVSP-1999, paper #28.