We present a technique for the segmention of a sound track into two classes of segments. Each frame of signal is preprocessed by extracting cepstral coefficients and their first order derivatives. For each class, the distri-bution of the frame parameter vectors is modeled by a Gaussian Mixture Model (GMM). GMM order is se-lected using two criteria : the Minimum Description Length (MDL) criterion and the Aka¨ike Information Cri-terion (AIC). Frame score is based on a weighted log-likelihood ratio in a window around the frame. De-cision for each frame is taken by comparing its score to a threshold. Experiments are presented on speech / music segmentation in audio tracks. In these experi-ments, the MDL criterion leads to a reasonable GMMor-der. Using the MDL criterion for GMM order selection, frame classification error rate is around 20%. However, using GMMs with much lower orders, only decreases marginally performances.
Cite as: Seck, M., Bimbot, F., Zugaj, D., Delyon, B. (1999) Two-class signal segmentation for speech/music detection in audio tracks. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2801-2804, doi: 10.21437/Eurospeech.1999-618
@inproceedings{seck99_eurospeech, author={Mouhamadou Seck and Frédéric Bimbot and Didier Zugaj and Bernard Delyon}, title={{Two-class signal segmentation for speech/music detection in audio tracks}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2801--2804}, doi={10.21437/Eurospeech.1999-618} }