EUROSPEECH 2003 - INTERSPEECH 2003
This paper describes an advanced spoken language corpus which has been constructed by enhancing an in-car speech database. The corpus has the following characteristic features: (1) series Advanced tag: Not only linguistic phenomena tags but also advanced discourse tags such as sentential structures, and utterance intentions, have been provided for the transcribed texts. (2) series Large-scale: The sentential structures and the intentions are currently provided for 45,053 phrases and 35,421 utterance units, respectively. (3) series Multi-layer: The corpus consists of different levels of spoken language data such as speech signals, transcribed texts, sentential structures, intentional markers and dialogue structures, moreover, they are related with each other. It allows a very wide variety of analysis of spontaneous spoken dialogue to utilize the multi-layered corpus. This paper also reports the result of investigation of the corpus, especially, focusing on the relations between the syntactic style and the intentional style of spoken utterances.
Bibliographic reference. Kishida, Itsuki / Irie, Yuki / Yamaguchi, Yukiko / Matsubara, Shigeki / Kawaguchi, Nobuo / Inagaki, Yasuyoshi (2003): "Construction of an advanced in-car spoken dialogue corpus and its characteristic analysis", In EUROSPEECH-2003, 1581-1584.