Multi-Corpus Acoustic-to-Articulatory Speech Inversion

Nadee Seneviratne, Ganesh Sivaraman, Carol Espy-Wilson


There are several technologies like Electromagnetic articulometry (EMA), ultrasound, real-time Magnetic Resonance Imaging (MRI), and X-ray microbeam that are used to measure speech articulatory movements. Each of these techniques provides a different view of the vocal tract. The measurements performed using the similar techniques also differ greatly due to differences in the placement of sensors, and the anatomy of speakers. This limits most articulatory studies to single datasets. However to yield better results in its applications, the speech inversion systems should be more generalized, which requires the combination of data from multiple sources. This paper proposes a multi-task learning based deep neural network architecture for acoustic-to-articulatory speech inversion trained using three different articulatory datasets — two of them were measured using EMA, and one using X-ray microbeam. Experiments show improved accuracy of the proposed acoustic-to-articulatory mapping compared to the systems trained using single datasets.


 DOI: 10.21437/Interspeech.2019-3168

Cite as: Seneviratne, N., Sivaraman, G., Espy-Wilson, C. (2019) Multi-Corpus Acoustic-to-Articulatory Speech Inversion. Proc. Interspeech 2019, 859-863, DOI: 10.21437/Interspeech.2019-3168.


@inproceedings{Seneviratne2019,
  author={Nadee Seneviratne and Ganesh Sivaraman and Carol Espy-Wilson},
  title={{Multi-Corpus Acoustic-to-Articulatory Speech Inversion}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={859--863},
  doi={10.21437/Interspeech.2019-3168},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3168}
}