Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

A Wavelet-Based Parameterization for Speech/Music Segmentation

E. Didiot, I. Illina, O. Mella, D. Fohr, Jean-Paul Haton

LORIA, France

The problem of speech/music discrimination is a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) performance. This paper proposes new features for the Speech/Music discrimination task. We propose to use a decomposition of the audio signal based on wavelets, which allows a good analysis of non stationary signal like speech or music. We compute different energy types in each frequency band obtained from wavelet decomposition. Two class/non-class classifiers are used : one for speech/non-speech, one for music/non-music. On the different test corpora, the proposed wavelet approach gives better results than the MFCC one. For instance, we have a significant relative improvements of the error rate of 58.0% on the "Scheirer" corpus for the speech/music discrimination task.

Full Paper

Bibliographic reference.  Didiot, E. / Illina, I. / Mella, O. / Fohr, D. / Haton, Jean-Paul (2006): "A wavelet-based parameterization for speech/music segmentation", In INTERSPEECH-2006, paper 1361-Mon3CaP.5.