ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)

September 26-27, 1997
Rhodes, Greece

Audio-Visual Speech Perception Without Traditional Speech Cues: A Second Report

Robert E. Remez (1), Jennifer M. Fellowes (1), David B. Pisoni (2), Winston D. Goh (2), Philip E. Rubin (3)

(1) Barnard College, New York, NY, USA
(2) Indiana University, Bloomington, IN, USA
(3) Haskins Laboratories, New Haven, CT, USA

Theoretical and practical motives alike have prompted investigations of multimodal speech perception. Theoretically, such studies lead the explanation of perceptual organization beyond the familiar modalitybound accounts deriving from Gestalt psychology. Practically, existing perceptual accounts fail to explain the proficiency of multimodal speech perception using an electrocochlear prosthesis for hearing. Accordingly, our research sought improved measures of audiovisual integration of videotaped faces and selected acoustic constituents of speech signals with an acoustic signal that departs from the natural spectral properties of speech. A single sinewave tone accompanied a video image of an articulating face; the frequency and amplitude of the phonatory cycle or of one of the lower three oral formants supplied the pattern for a sinewave signal. Our results showed a distinct advantage for the condition pairing the video with a sinewave replicating the second formant, despite its unnatural timbre and its presentation in acoustic isolation from the balance of the speech signal.

Full Paper

Bibliographic reference.  Remez, Robert E. / Fellowes, Jennifer M. / Pisoni, David B. / Goh, Winston D. / Rubin, Philip E. (1997): "Audio-visual speech perception without traditional speech cues: a second report", In AVSP-1997, 73-76.