INTERSPEECH 2006 - ICSLP
The problem of speech/music discrimination is a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) performance. This paper proposes new features for the Speech/Music discrimination task. We propose to use a decomposition of the audio signal based on wavelets, which allows a good analysis of non stationary signal like speech or music. We compute different energy types in each frequency band obtained from wavelet decomposition. Two class/non-class classifiers are used : one for speech/non-speech, one for music/non-music. On the different test corpora, the proposed wavelet approach gives better results than the MFCC one. For instance, we have a significant relative improvements of the error rate of 58.0% on the "Scheirer" corpus for the speech/music discrimination task.
Bibliographic reference. Didiot, E. / Illina, I. / Mella, O. / Fohr, D. / Haton, Jean-Paul (2006): "A wavelet-based parameterization for speech/music segmentation", In INTERSPEECH-2006, paper 1361-Mon3CaP.5.