Intent Discovery Through Unsupervised Semantic Text Clustering

Padmasundari, Srinivas Bangalore


Conversational systems need to understand spoken language to be able to converse with a human in a meaningful coherent manner. This understanding (Spoken Language understanding - SLU) of the human language is operationalized through identifying intents and entities. While classification methods that rely on labeled data are often used for SLU, creating large supervised data sets is extremely tedious and time consuming. This paper presents a practical approach to automate the process of intent discovery on unlabeled data sets of human language text through clustering techniques. We explore a range of representations for the texts and various clustering methods to validate the clustering stability through quantitative metrics like Adjusted Random Index (ARI). A final alignment of the clusters to the semantic intent is determined through consensus labelling. Our experiments on public datasets demonstrate the effectiveness of our approach generating homogeneous clusters with 89% cluster accuracy, leading to better semantic intent alignments. Furthermore, we illustrate that the clustering offer an alternate and effective way to mine sentence variants that can aid the bootstrapping of SLU models.


 DOI: 10.21437/Interspeech.2018-2436

Cite as: Padmasundari, ., Bangalore, S. (2018) Intent Discovery Through Unsupervised Semantic Text Clustering. Proc. Interspeech 2018, 606-610, DOI: 10.21437/Interspeech.2018-2436.


@inproceedings{Padmasundari2018,
  author={Padmasundari and Srinivas Bangalore},
  title={Intent Discovery Through Unsupervised Semantic Text Clustering},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={606--610},
  doi={10.21437/Interspeech.2018-2436},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2436}
}