The time-frequency spectrogram representation of an audio signal can be visually analysed by a trained researcher to recognise any underlying sound events in a process called spectrogram reading. However, this has not become a popular approach for automatic classification, as the field is driven by Automatic Speech Recognition (ASR) where frame-based features are popular. As opposed to speech, sound events typically have a more distinctive time-frequency representation, with the energy concentrated in a small number of spectral components. This makes them more suitable for classification based on their visual signature, and enables inspiration to be found in techniques from the related field of image processing. Recently, there have been a range of techniques that extract image processing-inspired features from the spectrogram for sound event classification. In this paper, we introduce the idea and structure behind six recent spectrogram image methods and analyse their performance on a large database containing 50 different environmental sounds to give a standardised comparison that is not often available in sound event classification tasks.
Bibliographic reference. Dennis, Jonathan / Tran, Huy Dat / Chng, Eng Siong (2014): "Analysis of spectrogram image methods for sound event classification", In INTERSPEECH-2014, 2533-2537.