Sixth European Conference on Speech Communication and Technology
We describe a method for automatically learning a parser from labeled, bracketed corpora that results in a fast, robust, lightweight parser that is suitable for real-time dialog systems and similar applications. Unlike ordinary parsers, all grammatical knowledge is captured in the learned decision trees, so no explicit phrase-structure grammar is needed. Another characteristic of the architecture is robustness, since the input need not fit pre-specified productions. Even without using specific lexical features, we have achieved respectable labeled bracket accuracies of about 81% precision and 82% recall. Processing speed is more than 500 words per CPU second. We keep the parameter space small (in comparison to other statistically learned parsers) by using only part-of-speech tags and constituent labels as features. Without any optimization, the decision trees consume only 6M of memory, making it possible to run on platforms with limited memory. The learning method is readily applicable to other languages. Preliminary experiments on a Chinese corpus (which contains about 3000 sentences from Chinese primary school text) have yielded results comparable to that for English.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Wong, Aboy / Wu, Dekai (1999): "Learning a lightweight robust deterministic parser", In EUROSPEECH'99, 2047-2050.