9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Combining Task-Dependent Information with Auditory Attention Cues for Prominence Detection in Speech

Ozlem Kalinli, Shrikanth S. Narayanan

University of Southern California, USA

Auditory attention is a highly complex mechanism that involves the process of low-level acoustic features of sound together with higher level cognitive rules. In this paper, a novel method that combines biologically inspired auditory attention cues with higher level lexical and syntactic information is proposed to model task-dependent influences on a given task. The feature maps are extracted from sound at multi-scales by mimicking the processing stages in the human auditory system, and converted to low-level auditory gist features. Then, the auditory attention model biases the gist features based on the task to maximize target detection. The top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The combined model is tested to detect prominent syllables in speech using the BU Radio News Corpus. The model achieves 88% prominence detection accuracy at syllable level, which is comparable to reported human performance on this task.

Full Paper

Bibliographic reference.  Kalinli, Ozlem / Narayanan, Shrikanth S. (2008): "Combining task-dependent information with auditory attention cues for prominence detection in speech", In INTERSPEECH-2008, 1064-1067.