ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
We present benchmark results of automatic speech recognition using the Corpus of Spontaneous Japanese (CSJ), which has been developed in the five-year national project and will be the largest spontaneous speech databases. New test-sets are designed for both academic presentation speech and extemporaneous public speech, which are the two major categories in the corpus. The testsets are selected to cover the variation of acoustic and linguistic factors in spontaneous speech: word perplexity, degree of disfluency, and the speaking rate. Baseline acoustic and language models are set up using an almost complete set (500 hours and 6.67M words) of the CSJ. Statistical modeling of pronunciation variation is also incorporated into the language model based on the alignment of large-scale transcriptions. The benchmark results verified the effects of the factors considered in the test-set design.
Bibliographic reference. Kawahara, Tatsuya / Nanjo, Hiroaki / Shinozaki, Takahiro / Furui, Sadaoki (2003): "Benchmark test for speech recognition using the corpus of spontaneous Japanese", in SSPR-2003, paper TMO4.