Auditory-Visual Speech Processing 2007 (AVSP2007)
Kasteel Groenendaal, Hilvarenbeek, The Netherlands
Modern signal processing techniques make it possible to speed up or slow down the apparent speaking rate of an audio-visual (AV) speech stimulus, but little is known about the effect this processing might have on the intelligibility of AV speech signals. In this experiment, AV recordings of phrases from the Modified Rhyme Test (MRT) were accelerated or decelerated by first changing the duration of the audio signal using the PSOLA speech processing algorithm (PRAAT) and then changing the frame rate of the AVI file to maintain synchronization of the audio and visual stimuli. The original speech phrases were recorded at either a fast speaking rate (roughly 5 syllables per second (syl/s)), a normal conversational rate (3.3 syl/s), and a slow rate (1.7 syl/s). The results of a preliminary experiment showed that conversational-rate AV recordings that were shifted in speed to match the slow or fast recordings produced the same audio and audiovisual intelligibility levels as the original recordings. However, some degradation in performance occurred when the fast recordings were slowed down or the slow recordings were speeded up. In the main experiment, the phrases were processed to set their speaking rates to eight different fixed values ranging from 0.6 syl/s to 20 syl/s. The results show that AV advantages were preserved at speaking rates as fast as 12.5 syl/s, but that they disappeared when the rate was increased to 20 syl/s. Notably, the results also failed to show any improvement in AV performance for phrases presented slower than their original speaking rates.
Bibliographic reference. Brungart, Douglas S. / Wassenhove, Virginie van / Brandewie, Eugene / Romigh, Griffin (2007): "The effects of temporal acceleration and deceleration on AV speech perception", In AVSP-2007, paper P27.