Interspeech'2005 - Eurospeech
This paper presents an unsupervised method that uses limited amount of labeled data and a large pool of unlabeled data to improve natural language call routing performance. The method uses multiple classifiers to select a subset of the unlabeled data to augment limited labeled data. We evaluated four widely used text classification algorithms; Naive Bayes Classification (NBC), Support Vector machines (SVM), Boosting and Maximum Entropy (MaxEnt). The NBC method is found to be poorest performer compared to other three classification methods. Combining SVM, Boosting and MaxEnt resulted in significant improvements in call classification accuracy compared to any single classifier performance across varying amounts of labeled data.
Bibliographic reference. Sarikaya, Ruhi / Kuo, Hong-Kwang Jeff / Goel, Vaibhava / Gao, Yuqing (2005): "Exploiting unlabeled data using multiple classifiers for improved natural language call-routing", In INTERSPEECH-2005, 433-436.