7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper deals with a novel approach to speech / music segmentation. Three original features, entropy modulation, stationary segment duration and number of segments are extracted. They are merged with the classical (4) Hz modulation energy. The relevance of these features is studied in a first experiment based on a development corpus composed of collected samples of speech and music. Another corpus is employed to verify the robustness of the algorithm. This experiment is made on a TV movie soundtrack and shows performances reaching a correct identification rate of 90%.
Bibliographic reference. Pinquier, Julien / Rouas, Jean-Luc / André-Obrecht, Régine (2002): "Robust speech / music classification in audio documents", In ICSLP-2002, 2005-2008.