We test an hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) phone recognition system that uses measured articulatory features as additional observations on two English corpora and an Italian corpus. The three corpora contain simultaneous recordings of speech acoustics and EMA (Electromagnetic Articulograph) data. We show that the additional articulatory features reconstructed from speech acoustics through an Acoustic-to-Articulatory Mapping, always produce a phone error reduction, with the exception of one single case where, however, the reconstruction accuracy of the articulatory features is significantly lower than in all other cases. Error analysis shows that in all corpora the articulatory features positively affect the discrimination of almost all phonemes although some phonemic categories are clearly more affected than others.
Index Terms: Acoustic-to-Articulatory Mapping, Electromagnetic articulograph, EMA, Deep Neural Networks, phone recognition
Cite as: Canevari, C., Badino, L., Fadiga, L., Metta, G. (2013) Cross-corpus and cross-linguistic evaluation of a speaker-dependent DNN-HMM ASR system using EMA data. Proc. Speech Production in Automatic Speech Recognition (SPASR-2013), 29-33
@inproceedings{canevari13_spasr, author={Claudia Canevari and Leonardo Badino and Luciano Fadiga and Giorgio Metta}, title={{Cross-corpus and cross-linguistic evaluation of a speaker-dependent DNN-HMM ASR system using EMA data}}, year=2013, booktitle={Proc. Speech Production in Automatic Speech Recognition (SPASR-2013)}, pages={29--33} }