15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Automatic Animation of an Articulatory Tongue Model from Ultrasound Images Using Gaussian Mixture Regression

Diandra Fabre, Thomas Hueber, Pierre Badin

GIPSA, France

This paper presents a method for automatically animating the articulatory tongue model of a reference speaker from ultrasound images of the tongue of another speaker. This work is developed in the context of speech therapy based on visual biofeedback, where a speaker is provided with visual information about his/her own articulation. In our approach, the feedback is delivered via an articulatory talking head, which displays the tongue during speech production using augmented reality (e.g. transparent skin). The user's tongue movements are captured using ultrasound imaging and parameterized using the PCA-based EigenTongue technique. Extracted features are then converted into control parameters of the articulatory tongue model using Gaussian Mixture Regression. This procedure was evaluated by decoding the converted tongue movements at the phonetic level using an HMM-based decoder trained on the reference speaker's articulatory data. Decoding errors were then manually reassessed in order to take into account possible phonetic idiosyncrasies (i.e. speaker / phoneme specific articulatory strategies). With a system trained on a limited set of 88 VCV sequences, the recognition accuracy at the phonetic level was found to be approximately 70%.

Full Paper

Bibliographic reference.  Fabre, Diandra / Hueber, Thomas / Badin, Pierre (2014): "Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression", In INTERSPEECH-2014, 2293-2297.