9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Phone Recognition from Ultrasound and Optical Video Sequences for a Silent Speech Interface

Thomas Hueber (1), Gérard Chollet (2), Bruce Denby (3), Gérard Dreyfus (1), Maureen Stone (4)

(1) LE-ESPCI, France; (2) LTCI, France; (3) Université Pierre et Marie Curie, France; (4) University of Maryland, USA

Latest results on continuous speech phone recognition from video observations of the tongue and lips are described in the context of an ultrasound-based silent speech interface. The study is based on a new 61-minute audiovisual database containing ultrasound sequences of the tongue as well as both frontal and lateral view of the speaker's lips. Phonetically balanced and exhibiting good diphone coverage, this database is designed both for recognition and corpus-based synthesis purposes. Acoustic waveforms are phonetically labeled, and visual sequences coded using PCA-based robust feature extraction techniques. Visual and acoustic observations of each phonetic class are modeled by continuous HMMs, allowing the performance of the visual phone recognizer to be compared to a traditional acoustic-based phone recognition experiment. The phone recognition confusion matrix is also discussed in detail.

Full Paper

Bibliographic reference.  Hueber, Thomas / Chollet, Gérard / Denby, Bruce / Dreyfus, Gérard / Stone, Maureen (2008): "Phone recognition from ultrasound and optical video sequences for a silent speech interface", In INTERSPEECH-2008, 2032-2035.