8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Continuous-Speech Phone Recognition from Ultrasound and Optical Images of the Tongue and Lips

Thomas Hueber (1), Gérard Chollet (2), Bruce Denby (1), Gérard Dreyfus (1), Maureen Stone (3)

(1) LE-ESPCI, France
(2) LTCI, France
(3) University of Maryland, USA

The article describes a video-only speech recognition system for a "silent speech interface" application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were estimated separately on the visual and acoustic corpus. The performance of the visual speech recognition system is compared to a traditional acoustic-based recognizer.

Full Paper

Bibliographic reference.  Hueber, Thomas / Chollet, Gérard / Denby, Bruce / Dreyfus, Gérard / Stone, Maureen (2007): "Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips", In INTERSPEECH-2007, 658-661.