Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Two-Class Signal Segmentation for Speech/Music Detection in Audio Tracks

Mouhamadou Seck, Frédéric Bimbot, Didier Zugaj, Bernard Delyon

IRISA SIGMA2 / INRIA & C.N.R.S., Campus Universitaire de Beaulieu, Rennes, France

We present a technique for the segmention of a sound track into two classes of segments. Each frame of signal is preprocessed by extracting cepstral coefficients and their first order derivatives. For each class, the distri-bution of the frame parameter vectors is modeled by a Gaussian Mixture Model (GMM). GMM order is se-lected using two criteria : the Minimum Description Length (MDL) criterion and the Aka¨ike Information Cri-terion (AIC). Frame score is based on a weighted log-likelihood ratio in a window around the frame. De-cision for each frame is taken by comparing its score to a threshold. Experiments are presented on speech / music segmentation in audio tracks. In these experi-ments, the MDL criterion leads to a reasonable GMMor-der. Using the MDL criterion for GMM order selection, frame classification error rate is around 20%. However, using GMMs with much lower orders, only decreases marginally performances.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Seck, Mouhamadou / Bimbot, Frédéric / Zugaj, Didier / Delyon, Bernard (1999): "Two-class signal segmentation for speech/music detection in audio tracks", In EUROSPEECH'99, 2801-2804.