Auditory-Visual Speech Processing (AVSP) 2009

University of East Anglia, Norwich, UK
September 10-13, 2009

Space-Time Audio-Visual Speech Recognition with Multiple Multi-Class Probabilistic Support Vector Machines

Samuel Pachoud, Shaogang Gong, Andrea Cavallaro

School of Electronic Engineering and Computer Science, Queen Mary, University of London, UK

We extract relevant and informative audio-visual features using multiple multi-class Support Vector Machines with probabilistic outputs, and demonstrate the approach in a noisy audio-visual speech reading scenario. We first extract visual spatio-temporal features and audio cepstral coefficients from pronounced digit sequences. Two classifiers are then trained on a single modality to obtain confidence factors that are used to select the most appropriate fusion strategy. A final classifier is trained on the joint audiovisual feature space and used to recognize digits. We demonstrate the proposed approach on a standard database and compare it with alternative methods. The evaluation shows that the proposed approach outperforms the alternatives both in terms of recognition accuracy and in terms of robustness.

Full Paper

Bibliographic reference.  Pachoud, Samuel / Gong, Shaogang / Cavallaro, Andrea (2009): "Space-time audio-visual speech recognition with multiple multi-class probabilistic support vector machines", In AVSP-2009, 155-160.