In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or only the video data is reliable and when they are both equally reliable, will attract our attention. A method to combine the video and audio information based on these three conditions will be presented. An implementation of this method in an automatic fusion depending on the noise level in the audio channel is developed. The performance of the complete system is demonstrated using two types of additive noise at varying SNR.
Cite as: Heckmann, M., Berthommier, F., Kroschel, K. (2001) A hybrid ANN/HMM audio-visual speech recognition system. Proc. Auditory-Visual Speech Processing, 189-194
@inproceedings{heckmann01_avsp, author={Martin Heckmann and Frederic Berthommier and Kristian Kroschel}, title={{A hybrid ANN/HMM audio-visual speech recognition system}}, year=2001, booktitle={Proc. Auditory-Visual Speech Processing}, pages={189--194} }