This paper presents an approach to an intelligent man-machine-communication system. The idea is to implement a so-called Virtual Personal Assistant (VPA) in the form of an animated speaking 3D human head model, which interacts with a human user by means of natural communication channels: speech, lips movements, mimics and gaze. The integration of these channels eases the communication and makes it more robust. As a first step towards such a system, we present an audio-visual speech recognition architecture: the video signal from the users mouth region is used along with the audio signal to recognize speech. Driven by the recognized sentences, a virtual 3D head with gazecontrol and lip-synchronous speech output reacts accordingly. For this virtual assistant, a behavior generating mechanism is implemented, which is based on a dynamical systems approach: the overall behavior of the humanoid is generated by means of nonlinear differential equations. As environment for the dialog, an e-commerce scenario was chosen, in which the virtual assistant describes items selected by the user.
Cite as: Rodriguez-Suarez, D.A., Sanchez-Martinez, M.J. (2004) An approach to a multimodal man-machine communication system. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 65-72
@inproceedings{rodriguezsuarez04_specom, author={Dario Alonso Rodriguez-Suarez and Maria José Sanchez-Martinez}, title={{An approach to a multimodal man-machine communication system}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={65--72} }