INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training

Thomas Hueber, Atef Ben-Youssef, Gérard Bailly, Pierre Badin, Frédéric Elisei

GIPSA-lab, UMR 5216/CNRS/INP/UJF/U.Stendhal, Grenoble, France

The article presents a statistical mapping approach for cross-speaker acoustic-to-articulatory inversion. The goal is to estimate the most likely articulatory trajectories for a reference speaker from the speech audio signal of another speaker. This approach is developed in the framework of our system of visual articulatory feedback developed for computer-assisted pronunciation training applications (CAPT). The proposed technique is based on the joint modeling of articulatory and acoustic features, for each phonetic class, using full-covariance trajectory HMM. The acousticto- articulatory inversion is achieved in 2 steps: 1) finding the most likely HMM state sequence from the acoustic observations; 2) inferring the articulatory trajectories from both the decoded state sequence and the acoustic observations. The problem of speaker adaptation is addressed using a voice conversion approach, based on trajectory GMM.

Index Terms: acoustic-to-articulatory inversion, intelligent tutoring systems, pronunciation training, trajectory HMM, voice conversion, talking head

Full Paper

Bibliographic reference.  Hueber, Thomas / Ben-Youssef, Atef / Bailly, Gérard / Badin, Pierre / Elisei, Frédéric (2012): "Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training", In INTERSPEECH-2012, 783-786.