A bottom-up or saliency driven attention allows the brain to detect nonspecific conspicuous targets in cluttered scenes before fully processing and recognizing the targets. Here, a novel biologically plausible auditory saliency map is presented to model such saliency based auditory attention. Multi-scale auditory features are extracted based on the processing stages in the central auditory system, and they are combined into a single master saliency map. The usefulness of the proposed auditory saliency map in detecting the prominent syllable and word locations in speech is tested in an unsupervised manner. When evaluated with broadcast news-style read speech using the BU Radio News Corpus, the model achieves 75.9 % accuracy at the syllable level, and 78.1 % accuracy at word level. These results compare well to results reported on human performance.
Bibliographic reference. Kalinli, Ozlem / Narayanan, Shrikanth S. (2007): "A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech", In INTERSPEECH-2007, 1941-1944.