ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
There are three major parts of the "Spontaneous Speech: Corpus and Processing Technology" project; (1) compilation of large spontaneous speech corpus, (2) establishment of spoken language engineering based on the corpus, and (3) developing a prototype of a spoken language summarization system. This paper describes how we help to develop this large corpus, i.e., (1), using technology developed as a part of (2). Firstly, we discuss how to annotate whole corpus morphologically. Secondly, we explain how we annotate sentence boundaries. And thirdly we discuss discourse annotation for CSJ. This paper describes overviews of these works and details of the works described in this paper are explained in the other papers in this volume.
Bibliographic reference. Isahara, Hitoshi (2003): "Corpus and text analysis of spontaneous Japanese", in SSPR-2003, paper MMO3.