Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification

Hardik B. Sailor, Dharmesh M. Agrawal, Hemant A. Patil


In this paper, we propose to use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in the mid-frequency range while some of the low-frequency subband filters resemble Gammatone basis. The auditory-like filterbank scale is nonlinear w.r.t. the center frequencies of the subband filters and follows the standard auditory scales. We have used our proposed model as a front-end for the Environmental Sound Classification (ESC) task with supervised Convolutional Neural Network (CNN) as a back-end. Using CNN classifier, the ConvRBM filterbank (ConvRBM-BANK) and its score-level fusion with the Mel filterbank energies (FBEs) gave an absolute improvement of 10.65%, and 18.70% in the classification accuracy, respectively, over FBEs alone on the ESC-50 database. This shows that the proposed ConvRBM filterbank also contains highly complementary information over the Mel filterbank, which is helpful in the ESC task.


 DOI: 10.21437/Interspeech.2017-831

Cite as: Sailor, H.B., Agrawal, D.M., Patil, H.A. (2017) Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification. Proc. Interspeech 2017, 3107-3111, DOI: 10.21437/Interspeech.2017-831.


@inproceedings{Sailor2017,
  author={Hardik B. Sailor and Dharmesh M. Agrawal and Hemant A. Patil},
  title={Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3107--3111},
  doi={10.21437/Interspeech.2017-831},
  url={http://dx.doi.org/10.21437/Interspeech.2017-831}
}