Acoustic signals from different sources in a natural environment form an auditory scene. Auditory scene analysis (ASA) is the process in which the auditory system segregates an auditory scene into streams corresponding to different sources. Segmentation is an important stage of ASA where an auditory scene is decomposed into segments, each of which contains signal mainly from one source. We propose a system for auditory segmentation based on analyzing onsets and offsets of auditory events. Our system first detects onsets and offsets, and then generates segments by matching corresponding onsets and offsets. This is achieved through a multiscale approach based on scale-space theory. Systematic evaluation shows that much target speech, including unvoiced speech, is correctly segmented, and target speech and interference are well separated into different segments.
Cite as: Hu, G., Wang, D. (2004) Auditory segmentation based on event detection. Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004), paper 62
@inproceedings{hu04_sapa, author={Guoning Hu and DeLiang Wang}, title={{Auditory segmentation based on event detection}}, year=2004, booktitle={Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004)}, pages={paper 62} }