ISCA Archive AVSP 2001
ISCA Archive AVSP 2001

Estimating focus of attention based on gaze and sound

Rainer Stiefelhagen, Jie Yang, Alex Waibel

Estimating a person's focus of attention is useful for various human-computer interaction applications, such as smart meeting rooms, where a user's goals and intent have to be monitored. In work presented here, we are interested in modeling focus of attention in a meeting situation. We have developed a system capable of estimating participants' focus of attention from multiple cues. We employ an omni- directional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately, and then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% error reduction on average compared to using a single modality.

Cite as: Stiefelhagen, R., Yang, J., Waibel, A. (2001) Estimating focus of attention based on gaze and sound. Proc. Auditory-Visual Speech Processing, 200

  author={Rainer Stiefelhagen and Jie Yang and Alex Waibel},
  title={{Estimating focus of attention based on gaze and sound}},
  booktitle={Proc. Auditory-Visual Speech Processing},