A new project on multi-modal analysis of poster sessions is introduced. We have designed an environment dedicated to recording of poster conversations using multiple sensors, and collected a number of sessions, to which a variety of multi-modal information is annotated, including utterance units for individual speakers, backchannels, nodding, gazing, and pointing. Automatic speaker diarization, that is a combination of speech activity detection and speaker identification, is conducted using a set of distant microphones, and a reasonable performance is obtained. Then, we investigate automatic classification of conversation segments into two modes: presentation mode and question-answer mode. Preliminary experiments show that multi-modal features on nonverbal behaviors play a significant role in the indexing of this kind of conversations.
Bibliographic reference. Kawahara, Tatsuya / Setoguchi, Hisao / Takanashi, Katsuya / Ishizuka, Kentaro / Araki, Shoko (2008): "Multi-modal recording, analysis and indexing of poster sessions", In INTERSPEECH-2008, 1622-1625.