ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data

Ryo Tanji, Hidefumi Ohmura, Kouichi Katsurada

We herein propose a deep neural network-based model for articulatory-to-acoustic conversion from real-time MRI data. Although rtMRI, which can record entire articulatory organs with a high resolution, has an advantage in articulatory-to-acoustic conversion, it has a relatively low sampling rate. To address this, we incorporated the super-resolution technique in the temporal dimension with a transposed convolution. With the use of transposed convolution, the resolution can be increased by applying the inversion process of resolution reduction of a standard CNN. To evaluate the performance on the datasets with different temporal resolutions, we conducted experiments using two datasets: USC-TIMIT and Japanese rtMRI dataset. Results of the experiments performed using mel-cepstrum distortion and PESQ showed that transposed convolution is effective for generating accurate acoustic features. We also confirmed that increasing the magnification of the super-resolution leads to an improvement in the PESQ score.


doi: 10.21437/Interspeech.2021-906

Cite as: Tanji, R., Ohmura, H., Katsurada, K. (2021) Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data. Proc. Interspeech 2021, 3176-3180, doi: 10.21437/Interspeech.2021-906

@inproceedings{tanji21_interspeech,
  author={Ryo Tanji and Hidefumi Ohmura and Kouichi Katsurada},
  title={{Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3176--3180},
  doi={10.21437/Interspeech.2021-906}
}