16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

“Multilingual” Deep Neural Network for Music Genre Classification

Jia Dai (1), Wenju Liu (1), Chongjia Ni (2), Like Dong (3), Hong Yang (3)

(1) Chinese Academy of Sciences, China
(2) SDUFE, China
(3) SGCC, China

Multilingual deep neural network (DNN) has been widely used in low-resource automatic speech recognition (ASR) in order to balance the rich-resource and low-resource speech recognition or to build the low-resource ASR system quickly. Inspired by the idea of using multilingual DNN for ASR, we use a “multilingual” DNN (Multi-DNN) for music genre classification. However, we do not have “multilingual” in music, so we use the similar resource instead. In order to obtain the similar resource corresponding to small target database, the nearest neighbor (NN) algorithm is used to re-label the large similar database. Then the re-labeled large similar database is used to train a Multi-DNN, and the small target database is used to further adapt the trained Multi-DNN. By using the Multi-DNN approach, the DNN can be well trained, and be transferred to the small target database quickly. The experiments are evaluated on the benchmark databases, ISMIR database and GTZAN database, which are used as the large similar database and small target database respectively. The experiment results show that the proposed method can achieve 93.4% (10-fold cross-validation) average classification accuracy on GTZAN database, which outperforms the state-of-the-art best performance on this database.

Full Paper

Bibliographic reference.  Dai, Jia / Liu, Wenju / Ni, Chongjia / Dong, Like / Yang, Hong (2015): "“multilingual” deep neural network for music genre classification", In INTERSPEECH-2015, 2907-2911.