8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Cross-Modal Informational Masking Due to Mismatched Audio Cues in a Speechreading Task

Douglas S. Brungart (1), Brian D. Simpson (1), Alex Kordik (2)

(1) Air Force Research Laboratory, USA
(2) Sytronics Inc., USA

Although most known examples of cross-modal interactions in audio-visual speech perception involve a dominant visual signal that modifies the apparent audio signal heard by the observer, there may also be cases where an audio signal can alter the visual image seen by the observer. In this experiment, we examined the effects that different distracting audio signals had on an observer's ability to speechread a color and number combination from a visual speech stimulus. When the distracting signal was noise, time-reversed speech, or irrelevant continuous speech, speechreading performance was unaffected. However, when the distracting audio signal was speech that followed the same general syntax as the target speech but contained a different color and number combination, speechreading performance was dramatically reduced. This suggests that the amount of interference an audio signal causes in a speechreading task strongly depends on the semantic similarity of the target and masking phrases. The amount of interference did not, however, depend on the apparent similarity between the audio speech signal and the visible talker: masking phrases spoken by a talker who was different in sex than the visible talker interfered nearly as much with the speechreading task as masking phrases spoken by the same talker used in the visual stimulus. A second experiment that examined the effects of desynchronizing the audio and visual signals found that the amount of interference caused by the audio phrase decreased when it was time advanced or time delayed relative to the visual target, but that time shifts as large as 1 s were required before performance approached the level achieved with no audio signal. The results of these experiments are consistent with the existence of a kind of cross-modal "informational masking" that occurs when listeners who see one word and hear another are unable to correctly determine which word was present in the visual stimulus.

Full Paper

Bibliographic reference.  Brungart, Douglas S. / Simpson, Brian D. / Kordik, Alex (2003): "Cross-modal informational masking due to mismatched audio cues in a speechreading task", In EUROSPEECH-2003, 1041-1044.