ISCA Archive ICSLP 1994

Computer assisted grammar construction

H.-H. Shih, Steve J. Young

This paper presents a system for computer assisted grammar construction (CAGC) and its application in speech processing. The CAGC system is designed to infer linguistically-motivated broad-coverage stochastic context-free grammars (SCFGs) for large corpora, without requiring significant manual contributions. Our approach utilizes an extended inside-outside learning algorithm [1] to train a hybrid SCFG [2] from a bracketed training set. The bracketing information is derived by an automatic surface bracketing system (AUTO) specifically designed for this purpose[3]. Experimental results, evaluated by using Parseval metrics [4], demonstrate that the CAGC system is capable of inferring a grammar from a subset of the Wall Street. Journal (WSJ) tagged text corpus and that the inferred grammar achieves high coverage and good precision. As an application, the inferred grammar acts as a language model for rescoring N-best outputs from a speech recognizer [5].

