INTERSPEECH 2006 - ICSLP
Predicting the end of user input turns in a multimodal system can be complex. User interactions vary across a spectrum from single, unimodal inputs to multimodal combinations delivered either simultaneously or sequentially. Early multimodal systems used a fixed duration temporal threshold to determine how long to wait for the next input before processing and integration. Several recent studies have proposed using dynamic or adaptive temporal thresholds to predict turn segmentation and thus achieve faster system response times. We introduce an approach that requires no temporal threshold. First we contrast current multimodal command interfaces to a new class of cumulativeobservant multimodal systems that we introduce. Within that new system class we show how our technique of edge-splitting combined with our strategy for under-specified, no-wait, visual feedback resolves parsing problems that underlie turn segmentation errors. Test results show a 46.2% significant reduction in multimodal recognition errors, compared to not using these techniques.
Bibliographic reference. Kaiser, Edward C. / Barthelmess, Paulo (2006): "Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion, combined with an under-specified display", In INTERSPEECH-2006, paper 2016-Tue2CaP.12.