The article presents an HMM-based mapping approach for converting ultrasound and video images of the vocal tract into an audible speech signal, for a silent speech interface application. The proposed technique is based on the joint modeling of articulatory and spectral features, for each phonetic class, using Hidden Markov Models (HMM) and multivariate Gaussian distributions with full covariance matrices. The articulatoryto- acoustic mapping is achieved in 2 steps: 1) finding the most likely HMM state sequence from the articulatory observations; 2) inferring the spectral trajectories from both the decoded state sequence and the articulatory observations. The proposed technique is compared to our previous approach, in which only the decoded state sequence was used for the inference of the spectral trajectories, independently from the articulatory observations. Both objective and perceptual evaluations show that this new approach leads to a better estimation of the spectral trajectories.
Index Terms: silent speech interface, handicap, HMM-based speech synthesis, audiovisual speech processing
Bibliographic reference. Hueber, Thomas / Bailly, Gérard / Denby, Bruce (2012): "Continuous articulatory-to-acoustic mapping using phone-based trajectory HMM for a silent speech interface", In INTERSPEECH-2012, 723-726.