In order to recover the movements of usually hidden articulators such as tongue or velum, we have developed a data-based speech inversion method. HMMs are trained, in a multistream framework, from two synchronous streams: articulatory movements measured by EMA, and MFCC + energy from the speech signal. A speech recognition procedure based on the acoustic part of the HMMs delivers the chain of phonemes and together with their durations, information that is subsequently used by a trajectory formation procedure based on the articulatory part of the HMMs to synthesise the articulatory movements. The RMS reconstruction error ranged between 1.1 and 2. mm.
Cite as: Youssef, A.B., Badin, P., Bailly, G., Heracleous, P. (2009) Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models. Proc. Interspeech 2009, 2255-2258, doi: 10.21437/Interspeech.2009-640
@inproceedings{youssef09_interspeech, author={Atef Ben Youssef and Pierre Badin and Gérard Bailly and Panikos Heracleous}, title={{Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2255--2258}, doi={10.21437/Interspeech.2009-640} }