Auditory-Visual Speech Processing 2007 (AVSP2007)
Kasteel Groenendaal, Hilvarenbeek, The Netherlands
This paper reports on the ACQA (Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent (ACA) for conducting research along two main lines: 1/ perceptual experiments (eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI (RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text (the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.
Bibliographic reference. Martin, Jean-Claude / Jacquemin, Christian / Pointal, Laurent / Katz, Brian / D'Alessandro, Christophe / Max, Aurélien / Courgeon, M. (2007): "A 3d audio-visual animated agent for expressive conversational question answering", In AVSP-2007, paper P40.