ISCA Archive SPECOM 2004
ISCA Archive SPECOM 2004

An approach to a multimodal man-machine communication system

Dario Alonso Rodriguez-Suarez, Maria José Sanchez-Martinez

This paper presents an approach to an intelligent man-machine-communication system. The idea is to implement a so-called Virtual Personal Assistant (VPA) in the form of an animated speaking 3D human head model, which interacts with a human user by means of natural communication channels: speech, lips movements, mimics and gaze. The integration of these channels eases the communication and makes it more robust. As a first step towards such a system, we present an audio-visual speech recognition architecture: the video signal from the user’s mouth region is used along with the audio signal to recognize speech. Driven by the recognized sentences, a virtual 3D head with gazecontrol and lip-synchronous speech output reacts accordingly. For this virtual assistant, a behavior generating mechanism is implemented, which is based on a dynamical systems approach: the overall behavior of the humanoid is generated by means of nonlinear differential equations. As environment for the dialog, an e-commerce scenario was chosen, in which the virtual assistant describes items selected by the user.


Cite as: Rodriguez-Suarez, D.A., Sanchez-Martinez, M.J. (2004) An approach to a multimodal man-machine communication system. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 65-72

@inproceedings{rodriguezsuarez04_specom,
  author={Dario Alonso Rodriguez-Suarez and Maria José Sanchez-Martinez},
  title={{An approach to a multimodal man-machine communication system}},
  year=2004,
  booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)},
  pages={65--72}
}