Audio Classification Using Class-Specific Learned Descriptors

Sukanya Sonowal, Tushar Sandhan, Inkyu Choi, Nam Soo Kim


This paper presents a classification scheme for audio signals using high-level feature descriptors. The descriptor is designed to capture the relevance of each acoustic feature group (or feature set like mel-frequency cepstral coefficients, perceptual features etc.) in recognizing an audio class. For this, a bank of RVM classifiers are modeled for each ‘audio class’–‘feature group’ pair. The response of an input signal to this bank of RVM classifiers forms the entries of the descriptor. Each entry of the descriptor thus measures the proximity of the input signal to an audio class based on a single feature group. This form of signal representation offers two-fold advantages. First, it helps to determine the effectiveness of each feature group in classifying a specific audio class. Second, the descriptor offers higher discriminability than the low-level feature groups and a simple SVM classifier trained on the descriptor produces better performance than several state-of-the-art methods.


 DOI: 10.21437/Interspeech.2017-982

Cite as: Sonowal, S., Sandhan, T., Choi, I., Kim, N.S. (2017) Audio Classification Using Class-Specific Learned Descriptors. Proc. Interspeech 2017, 484-487, DOI: 10.21437/Interspeech.2017-982.


@inproceedings{Sonowal2017,
  author={Sukanya Sonowal and Tushar Sandhan and Inkyu Choi and Nam Soo Kim},
  title={Audio Classification Using Class-Specific Learned Descriptors},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={484--487},
  doi={10.21437/Interspeech.2017-982},
  url={http://dx.doi.org/10.21437/Interspeech.2017-982}
}