9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Multi-Modal Recording, Analysis and Indexing of Poster Sessions

Tatsuya Kawahara (1), Hisao Setoguchi (1), Katsuya Takanashi (1), Kentaro Ishizuka (2), Shoko Araki (2)

(1) Kyoto University, Japan; (2) NTT Corporation, Japan

A new project on multi-modal analysis of poster sessions is introduced. We have designed an environment dedicated to recording of poster conversations using multiple sensors, and collected a number of sessions, to which a variety of multi-modal information is annotated, including utterance units for individual speakers, backchannels, nodding, gazing, and pointing. Automatic speaker diarization, that is a combination of speech activity detection and speaker identification, is conducted using a set of distant microphones, and a reasonable performance is obtained. Then, we investigate automatic classification of conversation segments into two modes: presentation mode and question-answer mode. Preliminary experiments show that multi-modal features on nonverbal behaviors play a significant role in the indexing of this kind of conversations.

Full Paper

Bibliographic reference.  Kawahara, Tatsuya / Setoguchi, Hisao / Takanashi, Katsuya / Ishizuka, Kentaro / Araki, Shoko (2008): "Multi-modal recording, analysis and indexing of poster sessions", In INTERSPEECH-2008, 1622-1625.