Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data

Asterios Toutios, Tanner Sorensen, Krishna Somandepalli, Rachel Alexander, Shrikanth S. Narayanan


This paper presents a methodology for articulatory synthesis of running speech in American English driven by real-time magnetic resonance imaging (rtMRI) mid-sagittal vocal-tract data. At the core of the methodology is a time-domain simulation of the propagation of sound in the vocal tract developed previously by Maeda. The first step of the methodology is the automatic derivation of air-tissue boundaries from the rtMRI data. These articulatory outlines are then modified in a systematic way in order to introduce additional precision in the formation of consonantal vocal-tract constrictions. Other elements of the methodology include a previously reported set of empirical rules for setting the time-varying characteristics of the glottis and the velopharyngeal port, and a revised sagittal-to-area conversion. Results are promising towards the development of a full-fledged text-to-speech synthesis system leveraging directly observed vocal-tract dynamics.


DOI: 10.21437/Interspeech.2016-596

Cite as

Toutios, A., Sorensen, T., Somandepalli, K., Alexander, R., Narayanan, S.S. (2016) Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data. Proc. Interspeech 2016, 1492-1496.

Bibtex
@inproceedings{Toutios+2016,
author={Asterios Toutios and Tanner Sorensen and Krishna Somandepalli and Rachel Alexander and Shrikanth S. Narayanan},
title={Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-596},
url={http://dx.doi.org/10.21437/Interspeech.2016-596},
pages={1492--1496}
}