ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
Corpus of Spontaneous Japanese, or CSJ, is a large-scale database of spontaneous Japanese. It contains speech signal and transcription of about 7 million words along with various annotations like POS and phonetic labels. After describing its design issues, preliminary evaluation of the CSJ was presented. The results suggest strongly the usefulness of the CSJ as the resource for the study of spontaneous speech.
Bibliographic reference. Maekawa, Kikuo (2003): "Corpus of spontaneous Japanese: its design and evaluation", in SSPR-2003, paper MMO2.