Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Discourse Structure for Spontaneous Spoken Interactions: Multi-Speaker vs. Human-Computer Dialogs

Sheryl R. Young

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

In real spoken language applications, speakers interact spontaneously and frequently diverge from the task at hand by initiating various types of sub-dialogs. Multi-speaker cooperative problem-solving dialogs evidence significantly more spontaneous phenomena than human-computer interactions. We claim that unconstrained, task-oriented spontaneous spoken dialog is structured and predictable in spite of such phenomena.

The discourse structure observed for any specific dialog is derived from the structure of the task, contextual constraints derived from prior interaction and the characteristics of a finite set of domain-independent discourse plans for communicating, including subdialogs and topic changes. There are oasically four algorithms for traversing domain plan trees and computing, for each potential application, the amount of perplexity reduction that will be achieved by applying these algorithms. However, when people interact spontaneously and speak freely, they often digress from strict verbal problem solving. To account for these behaviors, we modify our algorithms to allow for discourse structure effects including subdialog and topic change behaviors. The new algorithms maintain our abilities to constrain recognition and interpretation of spontaneous spoken utterances and account for subdialog phenomena as well as the meta-planning and initiative issues found in multiple speaker dialogs. This paper focuses on the distinctions in discourse structure that appear when two persons try to jointly and cooperatively solve a problem verbally, in contrast to a human - computer interaction. Multi-speaker, mutual problem solving dialogs exhibit significant initiative-based effects, where initiative for accomplishing the task can vary from utterance to utterance. Initiative is constrained in human-computer interaction - usually only one participant has the capability to initiate solutions to problems. Similarly, meta-planning or discussions of general problem solving constraints and ordering of problem solving or plan steps is only observed in multiple speaker interactions. We describe each of these, our algorithms for processing the phenomena and the resulting constraints that result from exploiting the structural regularities in multi-speaker, cooperative problem solving dialogs. The basic model of discourse structure and plan recognition for spontaneous spoken dialog has been implemented and evaluated on a 10,000 utterance corpora in the ARPA ATIS domain, a 3,000 utterance test corpora in a lunch-ordering domain and are being applied to a 700 dialog meeting scheduling application. The model dynamically constrains a speech recognizer, simplifies the process of inferring meaning from a spontaneous spoken utterance and accounts for the subdialojg phenomena observed. We describe these discourse plans, constraints on their occurrence and content, and their representation and processing. The model processes all subdialog phenomena using a domain plan tree, a current focus stack and a set of domain tree traversal algorithms. This paper describes a set of domain independent discourse structure algorithms for spontaneous spoken interaction. The paper overviews algorithms for traversing domain trees, or the set of potential plans that can be executed to solve problems in a specific application domain. It enumerates types of discourse plans and specifies how they interact with domain plans. The interaction results in constraints upon when specific types of discourse plans can occur and constraints upon their content. The paper then focuses upon unique properties of multi-speaker discourse.

Full Paper

Bibliographic reference.  Young, Sheryl R. (1994): "Discourse structure for spontaneous spoken interactions: multi-speaker vs. human-computer dialogs", In ICSLP-1994, 2227-2230.