AVSP 2003 - International Conference on Audio-Visual Speech Processing
September 4-7, 2003
Modern dialogue systems should interpret the users’ behavior and mind in the same way as human beings do. That means in a multimodal manner, where communication is not limited to verbal utterances, as is the case for most state-of-the-art dialogue systems, several modalities are involved, e.g., speech, gesture, and facial expression. The design of a dialogue system must adapt its concept to multimodal interaction and all these different modalities have to be combined in the dialogue system. This paper describes the recognition of a users internal state of mind using a prosody classifier based on artificial neural networks combined with a discrete Hidden Markov Model (HMM) for gesture analysis. Our experiments show that both input modalities can be used to identify the users internal state. We show that an improvement of up to 70% can be achieved when fusing both modalities.
Bibliographic reference. Shi, Rui P. / Adelhardt, Johann / Zeißler, Viktor / Batliner, Anton / Frank, Carmen / Nöth, Elmar / Niemann, Heinrich (2003): "Using speech and gesture to explore user states in multimodal dialogue systems", In AVSP 2003, 151-156.