7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

VisSTA: a Tool for Analyzing Multimodal Discourse Data

Francis Quek, Yang Shi, Cemil Kirbas, Shunguang Wu

Wright State University, USA

Human communication, seen in the broader sense, is multimodal involving the words spoken, prosody, hand gestures, head and eye gestures, body posture variation and facial expression. We present the multimedia Visualization for Situated Temporal Analysis (VisSTA) system for the analysis of multimodal human communication video, audio, speech transcriptions, and gesture and head orientation data. VisSTA is based on the Multiple Linked Representation, MLR strategy and keeps the user temporally situated by ensuring tight linkage among all interface components. Each component serves both as a system controller and display keeping every data element being visualized synchronized with the current time focus. VisSTA maintains multiple representations that include a hierarchical video-shot organization, a variety of animated graphs, animated time synchronized multi-tier text transcriptions, and an avatar representation. All data is synchronized with the underlying video.


Full Paper

Bibliographic reference.  Quek, Francis / Shi, Yang / Kirbas, Cemil / Wu, Shunguang (2002): "VisSTA: a tool for analyzing multimodal discourse data", In ICSLP-2002, 221-224.