AVSP 2003 - International Conference on Audio-Visual Speech Processing
September 4-7, 2003
The improvement of detectability by visible speech cues found by Grant and Seitz (JASA, 108:1197-1208, 2000) has been related to the degree of correlation between acoustic envelopes and visible movements. This suggests that the audio and visual signals could interact early during the audio-visual perceptual process on the basis of audio envelope cues. On the other hand, acoustic-visual correlations were previously reported by Yehia et al. (Speech Communication, 26(1):23-43, 1998). Taking into account these two main facts, the problem of extraction of the redundant audio-visual components is revisited: The video parametrization of natural images and three types of audio parameters are tested together, leading to new and realistic applications in video synthesis and audiovisual speech enhancement. Consistently with Grant and Seitz’ prediction, the 4-subbands envelope energy features are found to be optimal for encoding the redundant components available for the enhancement task. The computational model of audio-visual interaction which is proposed is based on the product, in the audio pathway, between the time-aligned audio envelopes and video-predicted envelopes. This interaction scheme is shown to be phonetically neutral, so that it will not bias the phonetic identification. Then, the low-level stage which is described is compatible with a late integration process, and this is a potential front-end for speech recognition applications.
Presentation. Two versions of the full paper are stored in HTML format with links to audiovisual presentations. The presentation is packed as a GNU zipped tar archive (71 MB) (which opens both under Windows and UNIX). If you want to open the presentation, download it and decompress it using the "Use folder name" option. This will create a directory av03_089. In this directory, select the file Berthommier.html. Note that a part of the presentations is stored on the author's website and not in this archive.
Bibliographic reference. Berthommier, Frédéric (2003): "A phonetically neutral model of the low-level audiovisual interaction ", In AVSP 2003, 89-94.