Most existing statistical surface realizers either make use of handcrafted grammars to provide coverage or are tuned to specific applications. This paper describes an initial effort toward building a statistical surface realization model that provides both precision and coverage. We trained a Maximum Entropy model that given a predicate-argument semantic representation, predicts the surface form for realizing a semantic concept and the ordering of sibling semantic concepts and their parent, on the Penn TreeBank and Proposition Bank corpora. Initial results have shown that the precisions for predicting surface forms and orderings reached 80% and 90% respectively, on a held-out part of Penn TreeBank. We use the model to generate sentences from our domain representations. We are in the process of evaluating the model on a corpus collected for our in-car applications.
Cite as: Cheng, H., Weng, F., Hantaweepant, N., Cavedon, L., Peters, S. (2005) Training a maximum entropy model for surface realization. Proc. Interspeech 2005, 1953-1956, doi: 10.21437/Interspeech.2005-610
@inproceedings{cheng05_interspeech, author={Hua Cheng and Fuliang Weng and Niti Hantaweepant and Lawrence Cavedon and Stanley Peters}, title={{Training a maximum entropy model for surface realization}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1953--1956}, doi={10.21437/Interspeech.2005-610} }