ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Benchmark Test for Speech Recognition Using the Corpus of Spontaneous Japanese

Tatsuya Kawahara (1), Hiroaki Nanjo (1), Takahiro Shinozaki (2), Sadaoki Furui (2)

(1) School of Informatics, Kyoto University, Japan
(2) Department of Computer Science, Tokyo Institute of Technology, Japan

We present benchmark results of automatic speech recognition using the Corpus of Spontaneous Japanese (CSJ), which has been developed in the five-year national project and will be the largest spontaneous speech databases. New test-sets are designed for both academic presentation speech and extemporaneous public speech, which are the two major categories in the corpus. The testsets are selected to cover the variation of acoustic and linguistic factors in spontaneous speech: word perplexity, degree of disfluency, and the speaking rate. Baseline acoustic and language models are set up using an almost complete set (500 hours and 6.67M words) of the CSJ. Statistical modeling of pronunciation variation is also incorporated into the language model based on the alignment of large-scale transcriptions. The benchmark results verified the effects of the factors considered in the test-set design.

Full Paper

