September 18-20, 2000
Paris, France

A Japanese National Project on Spontaneous Speech Corpus and Processing Technology

Sadaoki Furui (1), Kikuo Maekawa (2), and Hitoshi Isahara (3)

(1) Tokyo Institute of Technology, Tokyo, Japan
(2) The National Language Research Institute, Tokyo, Japan
(3) Communications Research Laboratory, Nishi-ku, Kobe, Japan

A new national project for raising the technological level of speech recognition and understanding has recently commenced in Japan. This project aims at a) building a large-scale spontaneous speech corpus consisting of roughly 7M words and 800 hours of speech, b) acoustic and linguistic modeling for spontaneous speech understanding and summarization using linguistic as well as para-linguistic information in speech, and c) building a prototype of a spontaneous speech summarization system. The corpus under compilation will contain spontaneously uttered Common Japanese speech and the morphologically annotated transcriptions. Also, segmental and intonation labeling will be provided for a subset of the corpus. The primary application domain of the corpus is speech recognition of spontaneous speech, but it is also planned to become a useful research corpus both for natural language processing and phonetic/linguistic studies.

Bibliographic reference.  Furui, Sadaoki / Maekawa, Kikuo / Isahara, Hitoshi (2000): "A Japanese national project on spontaneous speech corpus and processing technology", In ASR-2000, 244-248.