Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels

Zhi Chen, Wu Guo, Li-Rong Dai, Zhen-Hua Ling, Jun Du


In this paper, the deep learning framework is applied in text clustering, an unsupervised task in natural language processing (NLP). Since there are no predefined labels available for text clustering, the deep neural network is trained in a pseudo-supervised fashion with labels generated from pre-clustering step. To address the wrong labelling problem from pre-clustering step, we adopt soft pseudo-labels instead of hard one-hot ones, and these labels are dynamically updated during training. Besides, we build a document-level attention over multiple documents based on dynamic soft pseudo-labels to further reduce the impact of the wrong labels. Experimental results on three public databases show that our model outperforms the state-of-the-art systems.


 DOI: 10.21437/Interspeech.2019-1417

Cite as: Chen, Z., Guo, W., Dai, L., Ling, Z., Du, J. (2019) Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels. Proc. Interspeech 2019, 4225-4229, DOI: 10.21437/Interspeech.2019-1417.


@inproceedings{Chen2019,
  author={Zhi Chen and Wu Guo and Li-Rong Dai and Zhen-Hua Ling and Jun Du},
  title={{Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4225--4229},
  doi={10.21437/Interspeech.2019-1417},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1417}
}