11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Sparse Representations for Text Categorization

Tara N. Sainath (1), Sameer R. Maskey (1), Dimitri Kanevsky (1), Bhuvana Ramabhadran (1), David Nahamoo (1), Julia Hirschberg (2)

(1) IBM T.J. Watson Research Center, USA
(2) Columbia University, USA

Sparse representations (SRs) are often used to characterize a test signal using a few support training examples, and allow the number of supports to be adapted to the specific signal being categorized. Given the good performance of SRs compared to other classifiers for both image and phonetic classification, in this paper, we extend the use of SRs for text classification, a method which has thus far not been explored for this domain. Specifically, we demonstrate how sparse representations can be used for text classification and how their performance varies with the vocabulary size of the document features. In addition, we also show that this method offers promising results over the Naive Bayes (NB) classifier, a standard classifier used for text classification, thus introducing an alternative class of methods for text categorization.

Full Paper

Bibliographic reference.  Sainath, Tara N. / Maskey, Sameer R. / Kanevsky, Dimitri / Ramabhadran, Bhuvana / Nahamoo, David / Hirschberg, Julia (2010): "Sparse representations for text categorization", In INTERSPEECH-2010, 2266-2269.