Auditory attention is a highly complex mechanism that involves the process of low-level acoustic features of sound together with higher level cognitive rules. In this paper, a novel method that combines biologically inspired auditory attention cues with higher level lexical and syntactic information is proposed to model task-dependent influences on a given task. The feature maps are extracted from sound at multi-scales by mimicking the processing stages in the human auditory system, and converted to low-level auditory gist features. Then, the auditory attention model biases the gist features based on the task to maximize target detection. The top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The combined model is tested to detect prominent syllables in speech using the BU Radio News Corpus. The model achieves 88% prominence detection accuracy at syllable level, which is comparable to reported human performance on this task.
Bibliographic reference. Kalinli, Ozlem / Narayanan, Shrikanth S. (2008): "Combining task-dependent information with auditory attention cues for prominence detection in speech", In INTERSPEECH-2008, 1064-1067.