ISCA Archive AVSP 2013
ISCA Archive AVSP 2013

Integration of acoustic and visual cues in prominence perception

Hansjörg Mixdorff, Angelika Hönemann, Sascha Fagel

This study concerns the perception of prominence in auditoryvisual speech perception. We constructed A/V stimuli from five-syllable sentences in which every syllable was a candidate for receiving stress. All syllables were of uniform length, and the F0 contours were manipulated using the Fujisaki model, moving a peak of F0 from the beginning to the end of the utterance. The peak was either aligned with the center of the syllable or the boundary between syllables, yielding a total of nine positions. Likewise, a video showing the upper part of a speaker’s face exhibiting one single raise of eyebrows was aligned with the audio, hence yielding nine positions for the visual cue, with the maximum displacement of the eyebrows coinciding with syllable centers or boundaries. Another series of stimuli was produced with head nods as the visual cue. In addition stimuli with constant F0 with or without video were created. 22 German native subjects rated the strength of each of the five syllables in a stimulus on a scale from 1-3. Results show that the acoustic prominence outweighs the visual one, and that the integration of both in a single syllable is the strongest when the movement as well as the F0 peak are aligned with the center of the syllable. However, F0 peaks aligned with the right boundary of the accented syllable, as well as visual peaks aligned with the left one also boost prominence considerably. Nods had an effect similar in magnitude as eye brow movements, however, results suggest that they rather have to be aligned with the right boundary of the syllable than the left one.

Index Terms: Prominence, auditory-visual integration, F0 modeling

Cite as: Mixdorff, H., Hönemann, A., Fagel, S. (2013) Integration of acoustic and visual cues in prominence perception. Proc. Auditory-Visual Speech Processing, 111-116

  author={Hansjörg Mixdorff and Angelika Hönemann and Sascha Fagel},
  title={{Integration of acoustic and visual cues in prominence perception}},
  booktitle={Proc. Auditory-Visual Speech Processing},