Selecting efficiently a minimum amount of text from a large-scale text corpus to achieve a maximum coverage of certain units is an important problem in spoken language processing area. In this paper, the above text selection problem is first formulated as a maximum coverage problem with a Knapsack constraint (MCK). An efficient rank-predicted pseudo-greedy approach is then proposed to solve this problem. Experiments on a Chinese text selection task are conducted to verify the efficiency of the proposed approach. Experimental results show that our approach can improve significantly the text selection speed yet without sacrificing the coverage score compared with traditional greedy approach.
Bibliographic reference. Li, Wei / Huo, Qiang (2008): "A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units", In INTERSPEECH-2008, 1658-1661.