Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Language Modeling by Stochastic Dependency Grammar for Japanese Speech Recognition

Akinori Ito (1), Chiori Hori (2), Masaharu Katoh (1), Masaki Kohda (1)

(1) Faculty of Engineering, Yamagata University, Japan
(2) Tokyo Institute of Technology, Japan

This paper describes a language modeling technique using a kind of stochastic context free grammar (stochastic dependency grammar, SDG). In this work, two improvements are done upon the general CFG based SCFG model. The first improvement is to use a restricted grammar instead of general CFG. The dependency grammar used here is a restricted CFG that expresses modification between two words or phrases. The derivation probabilities are estimated by inside-outside algorithm. The computational complexity of the estimation is reduced from O(N3L3) to O(N2L3), where N and L means the number of nonterminals and length of a sentence respectively. Second, word grouping is introduced for further reduction of the estimation time. The basic idea is that regular grammar is applied within a group and CFG is used to express intergroup relationship. To achieve the idea, a new algorithm is introduced. When a group have two words in average, the learning time becomes about one-eighth. Two experiments were carried out to investigate the performance of the proposed model. In the first experiment, various kinds of SCFGs were compared using perplexity. From the result, it was found that the proposed model havemuch lower PP than the original model. As for the training speed, restricted grammar made training process twenty times faster, and the word grouping made it eight times faster. In the second experiment, the proposed model was used as a language model of LVCSR. The result showed that the proposed model was as good as bigram and trigram, and that the combination of trigram and the proposed model achieved further improvement of WER.

