16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Generalized Variable Parameter HMMs Based Acoustic-to-Articulatory Inversion

Xurong Xie, Xunying Liu, Lan Wang, Rongfeng Su

Chinese Academy of Sciences, China

Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNN-based approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable parameter HMMs (GVP-HMMs) is proposed. It leverages the strengths of both generative and neural network based modelling frameworks. On a Mandarin speech inversion task, a tandem GVP-HMM system using DNN bottleneck features as auxiliary inputs significantly outperformed the baseline HMM, multiple regression HMM (MR-HMM), DNN and deep mixture density network (MDN) systems by 0.20mm, 0.16mm, 0.12mm and 0.10mm respectively in terms of electromagnetic articulography (EMA) root mean square error (RMSE).

Full Paper

Bibliographic reference.  Xie, Xurong / Liu, Xunying / Wang, Lan / Su, Rongfeng (2015): "Generalized variable parameter HMMs based acoustic-to-articulatory inversion", In INTERSPEECH-2015, 279-283.