ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Learning a lightweight robust deterministic parser

Aboy Wong, Dekai Wu

We describe a method for automatically learning a parser from labeled, bracketed corpora that results in a fast, robust, lightweight parser that is suitable for real-time dialog systems and similar applications. Unlike ordinary parsers, all grammatical knowledge is captured in the learned decision trees, so no explicit phrase-structure grammar is needed. Another characteristic of the architecture is robustness, since the input need not fit pre-specified productions. Even without using specific lexical features, we have achieved respectable labeled bracket accuracies of about 81% precision and 82% recall. Processing speed is more than 500 words per CPU second. We keep the parameter space small (in comparison to other statistically learned parsers) by using only part-of-speech tags and constituent labels as features. Without any optimization, the decision trees consume only 6M of memory, making it possible to run on platforms with limited memory. The learning method is readily applicable to other languages. Preliminary experiments on a Chinese corpus (which contains about 3000 sentences from Chinese primary school text) have yielded results comparable to that for English.

doi: 10.21437/Eurospeech.1999-453

Cite as: Wong, A., Wu, D. (1999) Learning a lightweight robust deterministic parser. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2047-2050, doi: 10.21437/Eurospeech.1999-453

  author={Aboy Wong and Dekai Wu},
  title={{Learning a lightweight robust deterministic parser}},
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},