Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Exploiting Unlabeled Data Using Multiple Classifiers for Improved Natural Language Call-Routing

Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Vaibhava Goel, Yuqing Gao

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

This paper presents an unsupervised method that uses limited amount of labeled data and a large pool of unlabeled data to improve natural language call routing performance. The method uses multiple classifiers to select a subset of the unlabeled data to augment limited labeled data. We evaluated four widely used text classification algorithms; Naive Bayes Classification (NBC), Support Vector machines (SVM), Boosting and Maximum Entropy (MaxEnt). The NBC method is found to be poorest performer compared to other three classification methods. Combining SVM, Boosting and MaxEnt resulted in significant improvements in call classification accuracy compared to any single classifier performance across varying amounts of labeled data.

Full Paper

Bibliographic reference.  Sarikaya, Ruhi / Kuo, Hong-Kwang Jeff / Goel, Vaibhava / Gao, Yuqing (2005): "Exploiting unlabeled data using multiple classifiers for improved natural language call-routing", In INTERSPEECH-2005, 433-436.