12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Image Representation of the Subband Power Distribution for Robust Sound Classification

Jonathan Dennis, Huy Dat Tran, Haizhou Li

A*STAR, Singapore

This paper proposes a robust sound event classification method, based on a selective image feature driven from the novel subband power distribution (SPD), which represents the distribution of power over frequency components. This method is an extension of our previous work, which was motivated by the visual perception of the spectrogram to produce a robust feature for sound classification. Unlike the conventional spectrogram, the proposed SPD representation is invariant to time-shifting and therefore suitable for real scenarios where the detected sound clips are not always balanced. Furthermore, we develop a missing feature classification method, which automatically selects the sparse, representative areas of the signal from the noisy SPD image of the sound clip. The method is tested on a large database containing 50 sound classes, under four different noise environments, varying from clean to severe noise conditions. A significant improvement in performance was obtained in mismatched conditions, producing an average classification accuracy of 87.5% in the 0dB noise condition.

Full Paper

Bibliographic reference.  Dennis, Jonathan / Tran, Huy Dat / Li, Haizhou (2011): "Image representation of the subband power distribution for robust sound classification", In INTERSPEECH-2011, 2437-2440.