Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory

Aravind Illa, Prasanta Kumar Ghosh


Estimating articulatory movements from speech acoustic features is known as acoustic-to-articulatory inversion (AAI). Large amount of parallel data from speech and articulatory motion is required for training an AAI model in a subject dependent manner, referred to as subject dependent AAI (SD-AAI). Electromagnetic articulograph (EMA) is a promising technology to record such parallel data, but it is expensive, time consuming and tiring for a subject. In order to reduce the demand for parallel acoustic-articulatory data in the AAI task for a subject, we, in this work, propose a subject-adaptative AAI method (SA-AAI) from an existing AAI model which is trained using large amount of parallel data from a fixed set of subjects. Experiments are performed with 30 subjects’ acoustic-articulatory data and AAI is trained using BLSTM network to examine the amount of data needed from a new target subject for the SA-AAI to achieve an AAI performance equivalent to that of SD-AAI. Experimental results reveal that the proposed SA-AAI performs similar to that of the SD-AAI with ∼62.5% less training data. Among different articulators, the SA-AAI performance for tongue articulators matches with the corresponding SD-AAI performance with only ∼12.5% of the data used for SD-AAI training.


 DOI: 10.21437/Interspeech.2018-1843

Cite as: Illa, A., Ghosh, P.K. (2018) Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory. Proc. Interspeech 2018, 3122-3126, DOI: 10.21437/Interspeech.2018-1843.


@inproceedings{Illa2018,
  author={Aravind Illa and Prasanta Kumar Ghosh},
  title={Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3122--3126},
  doi={10.21437/Interspeech.2018-1843},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1843}
}