13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Local-feature-map Integration Using Convolutional Neural Networks for Music Genre Classification

Toru Nakashika (1), Christophe Garcia (2), Tetsuya Takiguchi (1)

(1) Department of System Informatics, Kobe University, Kobe, Japan
(2) LIRIS, CNRS, Insa de Lyon, Villeurbanne, France

A map-based approach, which treats 2-dimensional acoustic features using image analysis, has recently attracted attention in music genre classification. While this is successful at extracting local music-patterns compared with other frame-based methods, in most works the extracted features are not sufficient for music genre classification. In this paper, we focus on appropriate feature extraction and proper classification by integrating automatically learnt image feature. For the musical feature extraction, we build gray level co-occurrence matrix (GLCM) descriptors with different offsets from a short-term mel spectrogram. These feature maps are integratively classified using convolutional neural networks (ConvNets). In our experiments, we obtained a large improvement of more than 10 points in classification accuracy on the GTZAN database, compared with other ConvNets-based methods.

Index Terms: music genre classification, music information retrieval, music feature extraction, convolutional neural networks

Full Paper

Bibliographic reference.  Nakashika, Toru / Garcia, Christophe / Takiguchi, Tetsuya (2012): "Local-feature-map integration using convolutional neural networks for music genre classification", In INTERSPEECH-2012, 1752-1755.