9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

A Rank-Predicted Pseudo-Greedy Approach to Efficient Text Selection from Large-Scale Corpus for Maximum Coverage of Target Units

Wei Li (1), Qiang Huo (2)

(1) University of Hong Kong, China; (2) Microsoft Research Asia, China

Selecting efficiently a minimum amount of text from a large-scale text corpus to achieve a maximum coverage of certain units is an important problem in spoken language processing area. In this paper, the above text selection problem is first formulated as a maximum coverage problem with a Knapsack constraint (MCK). An efficient rank-predicted pseudo-greedy approach is then proposed to solve this problem. Experiments on a Chinese text selection task are conducted to verify the efficiency of the proposed approach. Experimental results show that our approach can improve significantly the text selection speed yet without sacrificing the coverage score compared with traditional greedy approach.

Full Paper

Bibliographic reference.  Li, Wei / Huo, Qiang (2008): "A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units", In INTERSPEECH-2008, 1658-1661.