Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification

Shefali Waldekar, Goutam Saha


Acoustic scene classification (ASC) is an audio signal processing task where mel-scaled spectral features are widely used by researchers. These features, considered de facto baseline in speech processing, traditionally employ Fourier based transforms. Unlike speech, environmental audio spans a larger range of audible frequency and might contain short high-frequency transients and continuous low-frequency background noise, simultaneously. Wavelets, with a better time-frequency localization capacity, can be considered more suitable for dealing with such signals. This paper attempts ASC by a novel use of wavelet transform based mel-scaled features. The proposed features are shown to possess better discriminative properties than other spectral features while using a similar classification framework. The experiments are performed on two datasets, similar in scene classes but differing by dataset size and length of the audio samples. When compared with two benchmark systems, one based on mel-frequency cepstral coefficients and Gaussian mixture models and the other based on log mel-band energies and multi-layer perceptron, the proposed system performed considerably better on the test data.


 DOI: 10.21437/Interspeech.2018-2083

Cite as: Waldekar, S., Saha, G. (2018) Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification. Proc. Interspeech 2018, 3323-3327, DOI: 10.21437/Interspeech.2018-2083.


@inproceedings{Waldekar2018,
  author={Shefali Waldekar and Goutam Saha},
  title={Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3323--3327},
  doi={10.21437/Interspeech.2018-2083},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2083}
}