Auditory-Visual Speech Processing
(AVSP 2001)

September 7-9, 2001
Aalborg, Denmark

A Hybrid ANN/HMM Audio-Visual Speech Recognition System

Martin Heckmann (1), Frederic Berthommier (2), Kristian Kroschel (1)

(1) Institut für Nachrichtentechnik, Universität Karlsruhe, Germany
(2) Institut de la Communication Parlée (ICP), Grenoble, France

In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or only the video data is reliable and when they are both equally reliable, will attract our attention. A method to combine the video and audio information based on these three conditions will be presented. An implementation of this method in an automatic fusion depending on the noise level in the audio channel is developed. The performance of the complete system is demonstrated using two types of additive noise at varying SNR.

