ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)
September 26-27, 1997
This paper deals with adaptive integration of visual information in an automatic speech recognition system. Our method consists of attaching a different weight to each modality involved in the recognition process. These acoustic and visual weights are adjusted dynamically, manly according to the SNR, which is provided to the system as a contextual input. This method is tested on three different audio-visual CHMMs-based systems. They implement respectively: the direct identification scheme (DI), the separate identification scheme (SI) and the hybrid (DI+SI) one. System performances are compared on the same task: speaker-dependent continuous spelling of French letters. Results obtained using audio and visual weights dynamically adapting to the circumstances are better than those obtained with equal weights, over different test condition (clean data and data with artificial noise).
Bibliographic reference. Rogozan, Alexandrina / Deléglise, Paul / Alissali, Mamoun (1997): "Adaptive determination of audio and visual weights for automatic speech recognition", In AVSP-1997, 61-64.