INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Relevance-Weighted-Reconstruction of Articulatory Features in Deep-Neural-Network-Based Acoustic-to-Articulatory Mapping

Claudia Canevari, Leonardo Badino, Luciano Fadiga, Giorgio Metta

Istituto Italiano di Tecnologia, Italy

We present a strategy for learning Deep-Neural-Network (DNN)- based Acoustic-to-Articulatory Mapping (AAM) functions where the contribution of an articulatory feature (AF) to the global recon- struction error is weighted by its relevance. We first empirically show that when an articulator is more crucial for the production of a given phone it is less variable, confirming previous findings. We then compute the relevance of an articulatory feature as a function of its frame-wise variance dependent on the acoustic evidence which is estimated through a Mixture Density Network (MDN). Finally we combine acoustic and recovered articulatory features in a hybrid DNN-HMM phone recognizer. Tested on the MOCHATIMIT corpus, articulatory features reconstructed by a standardly trained DNN lead to a 8.4% relative phone error reduction (w.r.t. a recognizer that only uses MFCCs), whereas when the articulatory features are reconstructed taking into account their relevance the relative phone error reduction increased to 10.9%.

Full Paper

Bibliographic reference.  Canevari, Claudia / Badino, Leonardo / Fadiga, Luciano / Metta, Giorgio (2013): "Relevance-weighted-reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping", In INTERSPEECH-2013, 1297-1301.