ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Two-class signal segmentation for speech/music detection in audio tracks

Mouhamadou Seck, Frédéric Bimbot, Didier Zugaj, Bernard Delyon

We present a technique for the segmention of a sound track into two classes of segments. Each frame of signal is preprocessed by extracting cepstral coefficients and their first order derivatives. For each class, the distri-bution of the frame parameter vectors is modeled by a Gaussian Mixture Model (GMM). GMM order is se-lected using two criteria : the Minimum Description Length (MDL) criterion and the Aka¨ike Information Cri-terion (AIC). Frame score is based on a weighted log-likelihood ratio in a window around the frame. De-cision for each frame is taken by comparing its score to a threshold. Experiments are presented on speech / music segmentation in audio tracks. In these experi-ments, the MDL criterion leads to a reasonable GMMor-der. Using the MDL criterion for GMM order selection, frame classification error rate is around 20%. However, using GMMs with much lower orders, only decreases marginally performances.


doi: 10.21437/Eurospeech.1999-618

Cite as: Seck, M., Bimbot, F., Zugaj, D., Delyon, B. (1999) Two-class signal segmentation for speech/music detection in audio tracks. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2801-2804, doi: 10.21437/Eurospeech.1999-618

@inproceedings{seck99_eurospeech,
  author={Mouhamadou Seck and Frédéric Bimbot and Didier Zugaj and Bernard Delyon},
  title={{Two-class signal segmentation for speech/music detection in audio tracks}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={2801--2804},
  doi={10.21437/Eurospeech.1999-618}
}