Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Visual Perception of Human Bodies and Faces for Multi-Modal Interfaces

Alex P. Pentland, Trevor Darrell

MIT Media Lab, Cambridge, MA, USA

In this paper we describe recent work in our laboratory on the use of computer vision techniques for real-time multi-modal interfaces. The methods described here allow the non-invasive perception of human users; no special markers or identifying features are assumed. Both user-independent and user-dependent algorithms for gesture recognition are used, depending on the context. We apply the same techniques used for recognition to the problem of generation of animated forms to accompany spoken language. Both realtime recognition and animation of facial gestures (e.g., a lip-synched "talking head") have been implemented within our framework.

Full Paper

Bibliographic reference.  Pentland, Alex P. / Darrell, Trevor (1994): "Visual perception of human bodies and faces for multi-modal interfaces", In ICSLP-1994, 543-546.