An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion

Aravind Illa, Prasanta Kumar Ghosh


Estimating speech representations from articulatory movements is known as articulatory-to-acoustic forward (AAF) mapping. Typically this mapping is learned using directly measured articulatory movement in a subject-specific manner. Such AAF mapping has been shown to benefit the speech synthesis applications. In this work, we investigate the speaker similarity and naturalness of utterances generated by AAF which is driven by the articulatory movements from a subject (referred to as cross speaker) different from the speaker (target speaker) used for training AAF mapping. Experiments are performed with directly measured articulatory data from 9 speakers (8 target speakers and 1 cross speaker), which are recorded using Electromagnetic articulograph AG501. Experiments are also performed with articulatory features estimated using speaker independent acoustic-to-articulatory inversion (SI-AAI) model trained on 26 reference speakers. Objective evaluation on target speakers reveal that the articulatory features estimated from SI-AAI result in a lower Mel-cepstrum distortion compared to that using directly measured articulatory features. Further, listening tests reveal that the directly measured articulatory movements preserve the speaker similarity better than estimated ones. Although, for naturalness, articulatory movements predicted by SI-AAI perform better than the direct measurements.


 DOI: 10.21437/Interspeech.2019-2664

Cite as: Illa, A., Ghosh, P.K. (2019) An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion. Proc. Interspeech 2019, 121-125, DOI: 10.21437/Interspeech.2019-2664.


@inproceedings{Illa2019,
  author={Aravind Illa and Prasanta Kumar Ghosh},
  title={{An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={121--125},
  doi={10.21437/Interspeech.2019-2664},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2664}
}