Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Computer Assisted Grammar Construction

H.-H. Shih, Steve J. Young

Cambridge University Engineering Department, Cambridge, England

This paper presents a system for computer assisted grammar construction (CAGC) and its application in speech processing. The CAGC system is designed to infer linguistically-motivated broad-coverage stochastic context-free grammars (SCFGs) for large corpora, without requiring significant manual contributions. Our approach utilizes an extended inside-outside learning algorithm [1] to train a hybrid SCFG [2] from a bracketed training set. The bracketing information is derived by an automatic surface bracketing system (AUTO) specifically designed for this purpose[3]. Experimental results, evaluated by using Parseval metrics [4], demonstrate that the CAGC system is capable of inferring a grammar from a subset of the Wall Street. Journal (WSJ) tagged text corpus and that the inferred grammar achieves high coverage and good precision. As an application, the inferred grammar acts as a language model for rescoring N-best outputs from a speech recognizer [5].

Full Paper

Bibliographic reference.  Shih, H.-H. / Young, Steve J. (1994): "Computer assisted grammar construction", In ICSLP-1994, 855-858.