Auditory-Visual Speech Processing (AVSP'99)

August 7-10, 1999
Santa Cruz, CA, USA

Audio-Visual Sensorfusion with Neural Architectures

B. Talle, A. Wichert

Department of Neural Information Processing, University of Ulm, Germany

In this paper we present a new word recognition system for monosyllabic words consisting of two types of neural networks which allows in an easy way the investigation of three different fusion architectures for audio-visual signals. Furthermore, two different kinds of preprocessing are compared: Besides low level data, a linear discriminant analysis is used for the audio and visual signals to reduce the dimensionality. Our cross-validation experiments show a slight advantage for an intermediate fusion model compared with an early fusion model which uses jointly preprocessed audio and visual data.

Full Paper

Bibliographic reference.  Talle, B. / Wichert, A. (1999): "Audio-visual sensorfusion with neural architectures", In AVSP-1999, paper #17.