This paper describes the application of the J-SUMMIT speech recognition system  to spontaneous Japanese speech. J-summit is a speech recognition system for Japanese being developed at MIT in the framework of SUMMIT which makes explicit use of acoustic-phonetic knowledge, embedded in a segmental framework that can be trained automatically [2, 3]. The goal of this work is both to assess the cross-language portability of the summit speech recognition system and to support the development of a bilingual version of the voyager speech understanding system. In order to apply j-summit to spontaneous speech, we first collected 1,400 spontaneous utterances from 40 speakers who spoke to a simulated Japanese version of the VOYAGER speech understanding system. In order to cover a reasonable portion of the training corpus, we extended the vocabulary to about 500 words and also changed the language model from a category-pair grammar to a category bigram, which was trained on the training portion of the spontaneous speech corpus. We trained the system on a 34-speaker subset of the spontaneous speech corpus and tested on the remaining six-speaker subset. The word error rate (the sum of insertion, deletion, and substitution errors) was 14.9% and the utterance error was 53.3% for the test set.
Keywords: speech recognition, spontaneous speech
Bibliographic reference. Sakai, Shinsuke / Phillips, Michael (1993): "J-SUMMIT: Japanese spontaneous speech recognition", In EUROSPEECH'93, 2151-2154.