Auditory-Visual Speech Processing (AVSP) 2011

Volterra, Italy
September 1-2, 2011

Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation

Jonas Beskow, Simon Alexandersson, Samer Al Moubayed, Jens Edlund, David House

Department of Speech, Music and Hearing, KTH, Sweden

Spoken face to face interaction is a rich and complex form of communication that includes a wide array of phenomena that are not fully explored or understood. While there has been extensive studies on many aspects in face-to-face interaction, these are traditionally of a qualitative nature, relying on hand annotated corpora, typically rather limited in extent, which is a natural consequence of the labour intensive task of multimodal data annotation. In this paper we present a corpus of 60 hours of unrestricted Swedish face-to-face conversations recorded with audio, video and optical motion capture, and we describe a new project setting out to exploit primarily the kinetic data in this corpus in order to gain quantitative knowledge on human face-to-face interaction.

Index Terms. motion capture, face-to-face conversation, multimodal corpus.

