INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Interplay of Informational Content and Energetic Masking in Speech Perception in Noise

Vincent Aubanel, Chris Davis, Jeesun Kim

University of Western Sydney, Australia

It seems plausible that different regions of the speech signal convey different amounts of information. Understanding which aspects of the signal convey information is important for understanding speech perception, particularly when this occurs in noisy environments. The so called cochlea-scaled entropy (CSE) measure is an index of spoken information based on the distribution of spectral energy over consonant/vowel time scales that is defined independently of potential noise corruption. In speech in noise, however, energetic masking distorts information, because it suppresses certain spectro-temporal regions. This study explored the interplay of informational content (defined by CSE) and energetic masking in explaining the listeners ability to understand speech in noise. Using a priming paradigm, mixtures of speech and speech-shape noise were presented to listeners in an identification task. Sentences were preceded by previews consisting of either low or high informational content. Both types yielded a similar performance increase of around 19%. Although the low information preview transmitted less target information it had a greater overlap with the more energetic regions of the target sentence (i.e., those that were less masked). This could explain why both preview types were effective and calls for a consideration of both measures in understanding speech recognition in noise.

Full Paper

Bibliographic reference.  Aubanel, Vincent / Davis, Chris / Kim, Jeesun (2014): "Interplay of informational content and energetic masking in speech perception in noise", In INTERSPEECH-2014, 2046-2049.