8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Construction of an Advanced In-Car Spoken Dialogue Corpus and its Characteristic Analysis

Itsuki Kishida, Yuki Irie, Yukiko Yamaguchi, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki

Nagoya University, Japan

This paper describes an advanced spoken language corpus which has been constructed by enhancing an in-car speech database. The corpus has the following characteristic features: (1) series Advanced tag: Not only linguistic phenomena tags but also advanced discourse tags such as sentential structures, and utterance intentions, have been provided for the transcribed texts. (2) series Large-scale: The sentential structures and the intentions are currently provided for 45,053 phrases and 35,421 utterance units, respectively. (3) series Multi-layer: The corpus consists of different levels of spoken language data such as speech signals, transcribed texts, sentential structures, intentional markers and dialogue structures, moreover, they are related with each other. It allows a very wide variety of analysis of spontaneous spoken dialogue to utilize the multi-layered corpus. This paper also reports the result of investigation of the corpus, especially, focusing on the relations between the syntactic style and the intentional style of spoken utterances.

Full Paper

Bibliographic reference.  Kishida, Itsuki / Irie, Yuki / Yamaguchi, Yukiko / Matsubara, Shigeki / Kawaguchi, Nobuo / Inagaki, Yasuyoshi (2003): "Construction of an advanced in-car spoken dialogue corpus and its characteristic analysis", In EUROSPEECH-2003, 1581-1584.