9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

A Combination of Data Mining Method with Decision Trees Building for Speech/Music Discrimination

Qiong Wu (1), Qin Yan (2), Jun Wang (1), Jun Hong (1)

(1) Chinese Academy of Sciences, China; (2) Hohai University, China

Nowadays the applications in multimedia domain require that the Speech/Music classifier has many other merits in addition to the accuracy, such as short-time delay and low complexity. Here, we endeavor to form a Speech/Music classifier by using different data mining methods. The main work of this paper is to obtain such system by analyzing the inherent validity of diverse features extracted from the audio, combining them into two subsets, and building a hieratical structure of decision trees to maintain optimal performances. The classifier is evaluated by a set of 5-to-11-minutes 450 audio files of different types of speech and music, and outperforms AMR-WB+ by achieving 97.6% and 95.2% correct classification rate at the 10ms frame level in pure and high SNR (>=20dB) environment respectively. Besides, its complexity is lower than 1WMOPS which make it easily adapted to many scenarios.

Full Paper

Bibliographic reference.  Wu, Qiong / Yan, Qin / Wang, Jun / Hong, Jun (2008): "A combination of data mining method with decision trees building for speech/music discrimination", In INTERSPEECH-2008, 2534-2537.