ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface

Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone

Recent improvements are presented for phonetic decoding of continuous-speech from ultrasound and optical observations of the tongue and lips in a silent speech interface application. In a new approach to this critical step, the visual streams are modeled by context-dependent multi-stream Hidden Markov Models (CD-MSHMM). Results are compared to a baseline system using context-independent modeling and a visual feature fusion strategy, with both systems evaluated on a one-hour, phonetically balanced English speech database. Tongue and lip images are coded using PCA-based feature extraction techniques. The uttered speech signal, also recorded, is used to initialize the training of the visual HMMs. Visual phonetic decoding performance is evaluated successively with and without the help of linguistic constraints introduced via a 2.5k-word decoding dictionary.


doi: 10.21437/Interspeech.2009-226

Cite as: Hueber, T., Benaroya, E.-L., Chollet, G., Denby, B., Dreyfus, G., Stone, M. (2009) Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface. Proc. Interspeech 2009, 640-643, doi: 10.21437/Interspeech.2009-226

@inproceedings{hueber09_interspeech,
  author={Thomas Hueber and Elie-Laurent Benaroya and Gérard Chollet and Bruce Denby and Gérard Dreyfus and Maureen Stone},
  title={{Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={640--643},
  doi={10.21437/Interspeech.2009-226}
}