10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Enhancing Audio Speech Using Visual Speech Features

Ibrahim Almajai, Ben Milner

University of East Anglia, UK

This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from visual speech features. Noise statistics for the Wiener filter utilise an audio-visual voice activity detector which classifies input audio as speech or nonspeech, enabling a noisemodel to be updated. Analysis shows estimation of speech and noise statistics to be effective with human listening tests measuring the effectiveness of the resulting Wiener filter.

Full Paper     Multimedia Files

Bibliographic reference.  Almajai, Ibrahim / Milner, Ben (2009): "Enhancing audio speech using visual speech features", In INTERSPEECH-2009, 1959-1962.