Third International Conference on Spoken Language Processing (ICSLP 94)
This paper presents a system for computer assisted grammar construction (CAGC) and its application in speech processing. The CAGC system is designed to infer linguistically-motivated broad-coverage stochastic context-free grammars (SCFGs) for large corpora, without requiring significant manual contributions. Our approach utilizes an extended inside-outside learning algorithm  to train a hybrid SCFG  from a bracketed training set. The bracketing information is derived by an automatic surface bracketing system (AUTO) specifically designed for this purpose. Experimental results, evaluated by using Parseval metrics , demonstrate that the CAGC system is capable of inferring a grammar from a subset of the Wall Street. Journal (WSJ) tagged text corpus and that the inferred grammar achieves high coverage and good precision. As an application, the inferred grammar acts as a language model for rescoring N-best outputs from a speech recognizer .
Bibliographic reference. Shih, H.-H. / Young, Steve J. (1994): "Computer assisted grammar construction", In ICSLP-1994, 855-858.