7th International Conference on Spoken Language Processing
September 16-20, 2002
Human communication, seen in the broader sense, is multimodal involving the words spoken, prosody, hand gestures, head and eye gestures, body posture variation and facial expression. We present the multimedia Visualization for Situated Temporal Analysis (VisSTA) system for the analysis of multimodal human communication video, audio, speech transcriptions, and gesture and head orientation data. VisSTA is based on the Multiple Linked Representation, MLR strategy and keeps the user temporally situated by ensuring tight linkage among all interface components. Each component serves both as a system controller and display keeping every data element being visualized synchronized with the current time focus. VisSTA maintains multiple representations that include a hierarchical video-shot organization, a variety of animated graphs, animated time synchronized multi-tier text transcriptions, and an avatar representation. All data is synchronized with the underlying video.
Bibliographic reference. Quek, Francis / Shi, Yang / Kirbas, Cemil / Wu, Shunguang (2002): "VisSTA: a tool for analyzing multimodal discourse data", In ICSLP-2002, 221-224.