ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Two-stage probabilistic approach to text segmentation

Yi-Chia Chen, Yi-Chung Lin

For telephone-based spoken dialogue systems, the responses to users should be specific and short. Therefore, it is highly demanded to segment a topical text into specific event segments which can be use to answer users' queries. However, the lexical cohesion approach, which has been widely used to segment text into topics, is not suitable for segmenting text into smaller units, like events. In this paper, we present a two-stage approach to partition text into event segments. In the first stage, a trigram chunk tagger is used to label the segmentation tags. In the second stage, the unreliable segmentation tags are detected and then verified by a probabilistic verification model. Compared with the chunk tagger, the verification model can explore more contextual information and is less sensitive to the sparseness of training data. Experimental results show that the proposed two-stage approach significantly outperforms the chunk tagger approach. The improvements on precision and recall rates are 27% to 83% in different testing tasks.

doi: 10.21437/Eurospeech.2001-233

Cite as: Chen, Y.-C., Lin, Y.-C. (2001) Two-stage probabilistic approach to text segmentation. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1081-1084, doi: 10.21437/Eurospeech.2001-233

  author={Yi-Chia Chen and Yi-Chung Lin},
  title={{Two-stage probabilistic approach to text segmentation}},
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},