Sixth European Conference on Speech Communication and Technology
We present a technique for the segmention of a sound track into two classes of segments. Each frame of signal is preprocessed by extracting cepstral coefficients and their first order derivatives. For each class, the distri-bution of the frame parameter vectors is modeled by a Gaussian Mixture Model (GMM). GMM order is se-lected using two criteria : the Minimum Description Length (MDL) criterion and the Aka¨ike Information Cri-terion (AIC). Frame score is based on a weighted log-likelihood ratio in a window around the frame. De-cision for each frame is taken by comparing its score to a threshold. Experiments are presented on speech / music segmentation in audio tracks. In these experi-ments, the MDL criterion leads to a reasonable GMMor-der. Using the MDL criterion for GMM order selection, frame classification error rate is around 20%. However, using GMMs with much lower orders, only decreases marginally performances.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Seck, Mouhamadou / Bimbot, Frédéric / Zugaj, Didier / Delyon, Bernard (1999): "Two-class signal segmentation for speech/music detection in audio tracks", In EUROSPEECH'99, 2801-2804.