Auditory-Visual Speech Processing (AVSP'99)
August 7-10, 1999
In this paper we present a new word recognition system for monosyllabic words consisting of two types of neural networks which allows in an easy way the investigation of three different fusion architectures for audio-visual signals. Furthermore, two different kinds of preprocessing are compared: Besides low level data, a linear discriminant analysis is used for the audio and visual signals to reduce the dimensionality. Our cross-validation experiments show a slight advantage for an intermediate fusion model compared with an early fusion model which uses jointly preprocessed audio and visual data.
Bibliographic reference. Talle, B. / Wichert, A. (1999): "Audio-visual sensorfusion with neural architectures", In AVSP-1999, paper #17.