Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training

Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh, Prasanta Kumar Ghosh


For the benefit of spoken language training, concatenation based articulatory video synthesis has been proposed in the past to overcome the limitation in the articulatory data recording. For this, real time magnetic resonance imaging (rt-MRI) video image-frames (IFs) containing articulatory movements have been used. These IFs require a visual augmentation for better understanding. We, in this work, propose an augmentation method using pixel intensities in the regions enclosed by the articulatory boundaries obtained from air-tissue boundaries (ATBs). Since, the pixel intensities reflect the muscle movements in the articulators, the augmented IFs could provide realistic articulatory movements, when we color them accordingly. However, the ATB manual annotation is time consuming; hence, we propose to synthesize ATBs using the ATBs from a few selected frames that have been used in synthesizing the articulatory videos. We augment a set of synthesized articulatory videos for 50 words obtained from the MRI-TIMIT database. Subjective evaluation on the quality of the augmented videos using twenty-one subjects suggests that the videos are visually more appealing than the respective synthesized rt-MRI videos with a rating of 3.75 out of 5, where a score of 5 (1) indicates that the augmented video quality is excellent (poor).


 DOI: 10.21437/Interspeech.2018-1570

Cite as: S, C., Yarra, C., Aggarwal, R., Mittal, S.K., N K, K., K T, R., Singh, A., Ghosh, P.K. (2018) Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training. Proc. Interspeech 2018, 3127-3131, DOI: 10.21437/Interspeech.2018-1570.


@inproceedings{S2018,
  author={Chandana S and Chiranjeevi Yarra and Ritu Aggarwal and Sanjeev Kumar Mittal and Kausthubha {N K} and Raseena {K T} and Astha Singh and Prasanta Kumar Ghosh},
  title={Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3127--3131},
  doi={10.21437/Interspeech.2018-1570},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1570}
}