ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Relevance-weighted-reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping

Claudia Canevari, Leonardo Badino, Luciano Fadiga, Giorgio Metta

We present a strategy for learning Deep-Neural-Network (DNN)- based Acoustic-to-Articulatory Mapping (AAM) functions where the contribution of an articulatory feature (AF) to the global recon- struction error is weighted by its relevance. We first empirically show that when an articulator is more crucial for the production of a given phone it is less variable, confirming previous findings. We then compute the relevance of an articulatory feature as a function of its frame-wise variance dependent on the acoustic evidence which is estimated through a Mixture Density Network (MDN). Finally we combine acoustic and recovered articulatory features in a hybrid DNN-HMM phone recognizer. Tested on the MOCHATIMIT corpus, articulatory features reconstructed by a standardly trained DNN lead to a 8.4% relative phone error reduction (w.r.t. a recognizer that only uses MFCCs), whereas when the articulatory features are reconstructed taking into account their relevance the relative phone error reduction increased to 10.9%.


doi: 10.21437/Interspeech.2013-346

Cite as: Canevari, C., Badino, L., Fadiga, L., Metta, G. (2013) Relevance-weighted-reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping. Proc. Interspeech 2013, 1297-1301, doi: 10.21437/Interspeech.2013-346

@inproceedings{canevari13_interspeech,
  author={Claudia Canevari and Leonardo Badino and Luciano Fadiga and Giorgio Metta},
  title={{Relevance-weighted-reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1297--1301},
  doi={10.21437/Interspeech.2013-346}
}