ISCA Archive AVSP 2003
Effects of image distortions on audio-visual speech recognition

Martin Heckmann, Frédéric Berthommier, Christophe Savariaux, Kristian Kroschel

Audio-visual speech recognition leads to significant improvements compared to pure audio recognition especially when the audio signal is corrupted by noise. This has been reproduced by many researchers. Little research has been done on the behavior of audio-visual recognition with additional degradations of the video signal, however. In this article we investigate the consequences of different types of image degradations, namely white noise, a JPEG compression, and errors in the localization of the mouth region, on the audio-visual recognition process. The first question we address is how the noise in the video stream in- fluences the recognition scores. Therefore we added noise to both, the audio and video signal at different SNR levels. The second question is how the adaptation of the fusion parameter, controlling the contribution of the audio and video stream to the recognition, is affected by the additional noise in the video stream. We compare the results we obtain when we adapt the fusion parameter to the noise in the audio and video stream to those we get when it is only adapted to the noise in the audio stream and hence a clean video stream is assumed. For the second type of tests we use an automatic adaptation of the fusion parameter based on the entropy of the a-posteriori probabilities from the audio stream.

Cite as: Heckmann, M., Berthommier, F., Savariaux, C., Kroschel, K. (2003) Effects of image distortions on audio-visual speech recognition. Proc. Auditory-Visual Speech Processing, 163-168

