In order to recover the movements of usually hidden articulators such as tongue or velum, we have developed a data-based speech inversion method. HMMs are trained, in a multistream framework, from two synchronous streams: articulatory movements measured by EMA, and MFCC + energy from the speech signal. A speech recognition procedure based on the acoustic part of the HMMs delivers the chain of phonemes and together with their durations, information that is subsequently used by a trajectory formation procedure based on the articulatory part of the HMMs to synthesise the articulatory movements. The RMS reconstruction error ranged between 1.1 and 2. mm.
Bibliographic reference. Youssef, Atef Ben / Badin, Pierre / Bailly, Gérard / Heracleous, Panikos (2009): "Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models", In INTERSPEECH-2009, 2255-2258.