In this paper, we present recent developments on the HMM-based acoustic-to-articulatory inversion approach that we develop for a "visual articulatory feedback" system. In this approach, multistream phoneme HMMs are trained jointly on synchronous streams of acoustic and articulatory data, acquired by electromagnetic articulography (EMA). Acoustic-to-articulatory inversion is achieved in two steps. Phonetic and state decoding is first performed. Then articulatory trajectories are inferred from the decoded phone and state sequence using the maximum-likelihood parameter generation algorithm (MLPG). We introduce here a new procedure for the re-estimation of the HMM parameters, based on the Minimum Generation Error criterion (MGE). We also investigate the use of model adaptation techniques based on maximum likelihood linear regression (MLLR), as a first step toward a multi-speaker visual articulatory feedback system.
Bibliographic reference. Youssef, Atef Ben / Hueber, Thomas / Badin, Pierre / Bailly, Gérard (2011): "Toward a multi-speaker visual articulatory feedback system", In INTERSPEECH-2011, 589-592.