ISCA Archive AVSP 2001
ISCA Archive AVSP 2001

Audio-visual recognition of spectrally reduced speech

Frederic Berthommier

Perceptual experiments on audio-visual consonant recognition based on the spectral reduction of the speech (SRS) have been carried out with coherent and incoherent (McGurk) audiovisual pairs. The main interest of SRS in four sub-bands is to have a partial suppression of the information transmitted for the place of articulation. The integration of manner, restricted to the fricative/occlusive contrast, is also of concern, and a new 'cross-manner' combination is tested. As expected, we have a good audio-visual complementarity for SRS and a high amount of McGurk responses, but new interesting effects are observed. For the interpretation of human confusion about place of articulation, the Bayesian model proposed by Massaro and Stork [8] is compared to a new place identification model which is based on averaging as well as on the separate identification of articulatory features. This decomposition is a promising way for the development of multi-stream speech recognition models.

Cite as: Berthommier, F. (2001) Audio-visual recognition of spectrally reduced speech. Proc. Auditory-Visual Speech Processing, 183-188

  author={Frederic Berthommier},
  title={{Audio-visual recognition of spectrally reduced speech}},
  booktitle={Proc. Auditory-Visual Speech Processing},