ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips

Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone

This article presents a framework for a phonetic vocoder driven by ultrasound and optical images of the tongue and lips for a "silent speech interface" application. The system is built around an HMM-based visual phone recognition step which provides target phonetic sequences from a continuous visual observation stream. The phonetic target constrains the search for the optimal sequence of diphones that maximizes similarity to the input test data in visual space subject to a unit concatenation cost in the acoustic domain. The final speech waveform is generated using "Harmonic plus Noise Model" synthesis techniques. Experimental results are based on a one-hour continuous speech audiovisual database comprising ultrasound images of the tongue and both frontal and lateral view of the speaker's lips.


doi: 10.21437/Interspeech.2008-527

Cite as: Hueber, T., Chollet, G., Denby, B., Dreyfus, G., Stone, M. (2008) Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips. Proc. Interspeech 2008, 2028-2031, doi: 10.21437/Interspeech.2008-527

@inproceedings{hueber08_interspeech,
  author={Thomas Hueber and Gérard Chollet and Bruce Denby and Gérard Dreyfus and Maureen Stone},
  title={{Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2028--2031},
  doi={10.21437/Interspeech.2008-527}
}