5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Estimation of the Probability Distributions of Stochastic Context-Free Grammars From the k-Best Derivations

Joan-Andreu Sanchez, Josť-Miguel Benedi

DSIC-Universidad Politecnica de Valencia, Spain

The use of the Inside-Outside algorithm for the estimation of the probability distributions of Stochastic Context-Free Grammars in Natural-Language processing is restricted due to the time complexity per iteration and the large number of iterations that it needs to converge. Alternatively, an algorithm based on the Viterbi score (VS) is used. This VS algorithm converges more rapidly, but obtains less competitive models. We propose a new algorithm that only considers the k-best derivations in the estimation process. The time complexity per iteration of the algorithm is practically the same as that of the VS algorithm. The proposed algorithm has been proven with a part of the Wall Street Journal task processed in the Penn Treebank project. The perplexity of a test set for the VS algorithm was 24.22 and this value was 22.73 for the proposed algorithm with k= 3, which means an improvement of 6.15%.

Full Paper

Bibliographic reference.  Sanchez, Joan-Andreu / Benedi, Josť-Miguel (1998): "Estimation of the probability distributions of stochastic context-free grammars from the k-best derivations", In ICSLP-1998, paper 1032.