ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Corpus of Spontaneous Japanese: its Design and Evaluation

Kikuo Maekawa

The National Institute for Japanese Language, Dept. Language Research, Tokyo, Japan

Corpus of Spontaneous Japanese, or CSJ, is a large-scale database of spontaneous Japanese. It contains speech signal and transcription of about 7 million words along with various annotations like POS and phonetic labels. After describing its design issues, preliminary evaluation of the CSJ was presented. The results suggest strongly the usefulness of the CSJ as the resource for the study of spontaneous speech.


Full Paper

Bibliographic reference.  Maekawa, Kikuo (2003): "Corpus of spontaneous Japanese: its design and evaluation", in SSPR-2003, paper MMO2.