8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Dynamic Time Windows for Multimodal Input Fusion

Anurag Kumar Gupta, Tasos Anastasakos

Motorola Inc, USA

Natural interaction in multimodal dialogue systems demands quick system response after the end of a user turn. The prediction of the end of user input at each multimodal dialog turn is complicated as users can interact through modalities in any order, and convey a variety of different messages to the system within the turn. Several multimodal interaction frameworks have used fixed-duration time windows to address this problem. We conducted a user study to evaluate the use of fixed-duration time windows and motivate further improvements. This paper describes a probabilistic method for computing an adaptive time window for multimodal input fusion. The goal is to adjust the time window dynamically depending on the user, task, and the number of multimodal inputs for each turn. Experimental results show that the resulting system has superior performance when compared to a system with fixed-duration time windows.

