Auditory-Visual Speech Processing (AVSP) 2009

University of East Anglia, Norwich, UK
September 10-13, 2009

Audiovisual Speech Recognition with Missing or Unreliable Data

Dorothea Kolossa, Steffen Zeiler, Alexander Vorwerk, Reinhold Orglmeister

Electronics and Medical Signal Processing Group, TU Berlin

In order to robustly recognize distorted speech, use of visual information has been proven valuable in many recent investigations. However, visual features may not always be available, and they can be unreliable in unfavorable recording conditions. The same is true for distorted audio information, where noise and interference can corrupt some of the acoustic speech features used for recognition. In this paper, missing feature techniques for coupled HMMs are shown to be successful in coping with both uncertain audio and video information. Since binary uncertainty information may be easily obtained at little computational effort, this results in an effective approach that can be implemented to obtain significant performance improvements for a wide range of statistical model based audiovisual recognition systems.

Index Terms: missing data techniques, audiovisual speech recognition, coupled HMM

Full Paper

Bibliographic reference.  Kolossa, Dorothea / Zeiler, Steffen / Vorwerk, Alexander / Orglmeister, Reinhold (2009): "Audiovisual speech recognition with missing or unreliable data", In AVSP-2009, 117-122.