![]() |
Error Handling in Spoken Dialogue SystemsAugust 28-31, 2003 |
![]() |
This paper presents research on the use of audiovisual prosody to signal a speaker's level of uncertainty. The first study consists of an experiment, in which subjects are asked factual questions in a conversational setting, while they are being filmed. Statistical analyses bring to light that the speakers' Feeling-of-Knowing (FOK) correlate signifcantly with a number of visual and verbal properties. Interestingly, it appears that answers tend to have a higher number of marked feature settings (i.e., divergences of the neutral audiovisual expression) when the FOK score is low, while the reverse is true for non-answers. The second study is a perception experiment, in which a selection of the utterances from the first study is presented to subjects in one of three conditions: vision only, sound only or vision+sound. Results reveal that human observers can reliably distinguish HighFOK responses from LowFOK responses in all three conditions, be it that answers are easier than non-answers, and that a bimodal presentation of the stimuli is easier than their unimodal counterparts. Results of these two experiments are potentially relevant for improving the communication style in human-machine interaction.
Bibliographic reference. Swerts, Marc / Krahmer, Emiel / Barkhuysen, Pashiera / Laar, Lennard van de (2003): "Audiovisual cues to uncertainty", In EHSD-2003, 25-30.