7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper, we describe audiovisual automatic speech recognition experiments carried using visual parameters extracted from "natural" images. Unlike many other experiments in the AV ASR field, these visual parameters are obtained without any hand-labeling phase and are naturally noisy, due to the extraction process. We evaluate our models with different strategies among which : use of a shape model combined with or after an appearance model. For audiovisual parameters integration, we use a basic DI architecture with a fixed weight. We use a new evaluation criterion to measure the quality of parameters which proves to be efficient, and aim to use it in the near future, for an adaptive weighting scheme.
Bibliographic reference. Daubias, Philippe / Deléglise, Paul (2002): "Lip-reading based on a fully automatic statistical model", In ICSLP-2002, 209-212.