11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Statistical Multi-Stream Modeling of Real-Time MRI Articulatory Speech Data

Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan

University of Southern California, USA

This paper investigates different statistical modeling frameworks for articulatory speech data obtained using real-time (RT) magnetic resonance imaging (MRI). To quantitatively capture the spatio-temporal shaping process of the human vocal tract during speech production a multi-dimensional stream of image features is derived from the MRI recordings. The features are closely related, though not identical, to the tract variables commonly defined in the articulatory phonology theory. The modeling of the shaping process aims at decomposing the articulatory data streams into primitives by segmentation, and the segmentation task is carried out using vector quantizers, Gaussian Mixture Models, Hidden Markov Models, and a coupled Hidden Markov Model. We evaluate the performance of the different segmentation schemes qualitatively with the help of a well understood data set which was used in a earlier study of inter-articulatory timing phenomena of American English nasal sounds.

Full Paper

Bibliographic reference.  Bresch, Erik / Katsamanis, Athanasios / Goldstein, Louis / Narayanan, Shrikanth S. (2010): "Statistical multi-stream modeling of real-time MRI articulatory speech data", In INTERSPEECH-2010, 1584-1587.