Third International Conference on Spoken Language Processing (ICSLP 94)
We are developing multimodal man-machine interfaces through which users can communicate by integrating speech, gaze, facial expressions, and gestures such as nodding and finger pointing. Such multimodal interfaces are expected to provide more flexible, natural and productive communications between humans and computers. To achieve this goal, we have taken the approach of modeling human behaviors in the context of ordinary face-to-face conversations. As a first step, we have implemented a system which utilizes video and audio recording equipment to capture verbal and nonverbal information in interpersonal communications. Using this system, we have collected data from a task-oriented conversation between a guest (subject) and a receptionist at company reception, and quantitatively analyzed this data with respect to multi-modalities. This paper presents data showing that head nodding and gaze are related to speech content, acting to supplement speech information. We also discuss issues related to the timing of turn taking and listener responses, which yield a natural rhythm for human/computer interaction.
Bibliographic reference. Watanuki, Keiko / Sakamoto, Kenji / Togawa, Fumio (1994): "Analysis of multimodal interaction data in human communication", In ICSLP-1994, 899-902.