In this paper we describe an unsupervised approach for the automated categorisation of utterances into predefined categories of symptoms (or problems) within the framework of a technical support automated agent. The utterance classification is performed based on an iterative K-means clustering method. In order to improve the lower accuracy typical of unsupervised algorithms, we have analysed two different enhancements of the classification algorithm. The first method exploits the affinity among words by automatically extracting classes of semantically equivalent terms. The second approach consists of a disambiguation technique based on a new criterion to estimate the relevance of terms for the classification. An analysis of the results of an experimental evaluation performed on a corpus of 34848 utterances concludes the paper.
Bibliographic reference. Albalate, Amparo / Dimitrov, Dimitar / Pieraccini, Roberto (2007): "Unsupervised categorisation approaches for technical support automated agents", In INTERSPEECH-2007, 1625-1628.