ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units

Wei Li, Qiang Huo

Selecting efficiently a minimum amount of text from a large-scale text corpus to achieve a maximum coverage of certain units is an important problem in spoken language processing area. In this paper, the above text selection problem is first formulated as a maximum coverage problem with a Knapsack constraint (MCK). An efficient rank-predicted pseudo-greedy approach is then proposed to solve this problem. Experiments on a Chinese text selection task are conducted to verify the efficiency of the proposed approach. Experimental results show that our approach can improve significantly the text selection speed yet without sacrificing the coverage score compared with traditional greedy approach.


doi: 10.21437/Interspeech.2008-460

Cite as: Li, W., Huo, Q. (2008) A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units. Proc. Interspeech 2008, 1658-1661, doi: 10.21437/Interspeech.2008-460

@inproceedings{li08f_interspeech,
  author={Wei Li and Qiang Huo},
  title={{A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1658--1661},
  doi={10.21437/Interspeech.2008-460}
}