Multi-Modal Dialogue in Mobile Environments (IDS-02)

June 17-19, 2002
Kloster Irsee, Germany

Toward Adaptive Conversational Interfaces: Modeling Speech Convergence with Animated Personas

Sharon Oviatt (1), Courtney Stevens (1), Rachel Coulston (1), Benfang Xiao (1), Matt Wesson (1), Cynthia Girand (2), and Evan Mellander (3)

(1) Department of Computer Science and Engineering, Oregon Health and Science University, USA
(2) Department of Linguistics, University of Colorado, Boulder, CO, USA:
(3) Center for Cognitive Science, Leipzig University, Germany

During interpersonal conversation, both children and adults adapt the basic acoustic-prosodic features of their speech to converge with those of their conversational partner. However, comparable adaptivity in users’ speech signal has not been explored previously during human-computer interaction. In this study, 7-to-10-year-old children interacted with a multimodal conversational interface in which animated characters used text-to-speech output (TTS) to answer questions about marine biology. Analysis of children’s speech input to the animated characters revealed that it adapted to more closely match the TTS output they heard. When speaking with an extroverted animated character whose speech was faster paced and louder, children significantly increased their utterance amplitude and decreased the duration of their dialogue response latencies between conversational turns. In contrast, when speaking with an introverted partner, they decreased their amplitude and increased response latencies. These adaptations were dynamic, bi-directional, and generalized across different user groups and TTS voices. Implications are discussed for guiding children’s spoken language to be better synchronized and more easily processed by a conversational system, and for the future development of robust and adaptive conversational interfaces.


Full Paper

Bibliographic reference.  Oviatt, Sharon / Stevens, Courtney / Coulston, Rachel / Xiao, Benfang / Wesson, Matt / Girand, Cynthia / Mellander, Evan (2002): "Toward adaptive conversational interfaces: Modeling speech convergence with animated personas", In IDS-2002, paper 27; 8 pp.