5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Using the Multi-Stream Approach for Continuous Audio-Visual Speech Recognition: Experiments on the M2VTS Database

Stéphane Dupont (1), Juergen Luettin (2)

(1) Faculte Polytechnique de Mons (FPMs), Belgium
(2) Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP), Switzerland

The Multi-Stream automatic speech recognition approach was investigated in this work as a framework for Audio-Visual data fusion and speech recognition. This method presents many potential advantages for such a task. It particularly allows for synchronous decoding of continuous speech while still allowing for some asynchrony of the visual and acoustic information streams. First, the Multi-Stream formalism is briefly recalled. Then, on top of the Multi-Stream motivations, experiments on the M2VTS multimodal database are presented and discussed. To our knowledge, these are the first experiments addressing multi-speaker continuous Audio-Visual Speech Recognition (AVSR). It is shown that the Multi-Stream approach can yield improved Audio-Visual speech recognition performance when the acoustic signal is corrupted by noise as well as for clean speech.

Full Paper

Bibliographic reference.  Dupont, Stéphane / Luettin, Juergen (1998): "Using the multi-stream approach for continuous audio-visual speech recognition: experiments on the M2VTS database", In ICSLP-1998, paper 0582.