The article describes a video-only speech recognition system for a "silent speech interface" application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were estimated separately on the visual and acoustic corpus. The performance of the visual speech recognition system is compared to a traditional acoustic-based recognizer.
Bibliographic reference. Hueber, Thomas / Chollet, Gérard / Denby, Bruce / Dreyfus, Gérard / Stone, Maureen (2007): "Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips", In INTERSPEECH-2007, 658-661.