Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNN-based approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable parameter HMMs (GVP-HMMs) is proposed. It leverages the strengths of both generative and neural network based modelling frameworks. On a Mandarin speech inversion task, a tandem GVP-HMM system using DNN bottleneck features as auxiliary inputs significantly outperformed the baseline HMM, multiple regression HMM (MR-HMM), DNN and deep mixture density network (MDN) systems by 0.20mm, 0.16mm, 0.12mm and 0.10mm respectively in terms of electromagnetic articulography (EMA) root mean square error (RMSE).
Bibliographic reference. Xie, Xurong / Liu, Xunying / Wang, Lan / Su, Rongfeng (2015): "Generalized variable parameter HMMs based acoustic-to-articulatory inversion", In INTERSPEECH-2015, 279-283.