Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Integrating Prosodic Features in Dialogue Understanding

Masafumi Tamoto, Masahito Kawamori, Takeshi Kawabata

NTT Laboratories, Atsugi, Japan

We report our studies on functions of prosodic information in dialogue and on the result of our experiment in dialogue using a speech understanding system that incorporates a discrimination schema for illocutionary acts using prosodic features obtained from human-human dialogs.For constructing a speech understanding system with an `effortless' interface, it is necessary to model coordination in dialogue. This model needs to capture such sequential constraints of illocutionary acts as answers following questions, acceptance or rejections following requests, acknowledgements following assertions, and so on. However, recognizing these speech acts by simple, superficial analysis of dialogue is often difficult because of such disfluencies as omission and interruption that abound in spontaneous dialogs. Prosodic features are important in this respect because they often contribute to identifying speech acts of utterance when explicit linguistic information is missing. In order to investigate how prosodic information is utilized, along with other linguistic information, to identify speech acts, we performed a series of experiments. We collected task oriented spontaneous dialogs between human subjects, and extracted sentences that represent dialogue control structure. A different set of subjects were chosen for an experiment in which they were asked to identify the sentence type and intonation contour of these sentences. Given the transcription of these extracted sentences with contextual information, the subjects were able to identify the speech act types of about 85$ the 290 sentences. The subjects were then asked to identify the intonation contour of the same sentences by listening to the utterance modified in such a way that all voiced sounds were replaced by sinusoid so that only the fundamental frequency of the utterance can be heard. We observed syntactic and prosodic properties of those utterances Speech acts were represented as three basic categories, the illocutions of assertion, question and request. Similarly, sentence types are represented as declarative, interrogative and imperative. Intonations are classified into rise-up, fall-down and neutral pitch contour. We then made a simulation task of dialogue understanding system to incorporate the results of the human-human dialogue experiment. A sentence type was identified using human subjects. An intonation contour was identified using an algorithm that calculates the range and slope of the upper and lower bounds of unwarped segmental contour, and matches these against predefined contour templates.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Tamoto, Masafumi / Kawamori, Masahito / Kawabata, Takeshi (1999): "Integrating prosodic features in dialogue understanding", In EUROSPEECH'99, 37-40.