Sensorimotor Response to Visual Imagery of Tongue Displacement

William F. Katz, Divya Prabhakaran

To better understand audiovisual speech processing, we investigated the effects of viewing time-synchronized videos of a 3D tongue avatar on vowel production by healthy individuals. A group of 15 American English-speaking subjects heard pink noise over headphones and produced the word head under four viewing conditions: First, while viewing repetitions of the same vowel, /ε/ (baseline phase), then during a series of “morphed” videos shifting gradually from /ε/ to /æ/ (ramp phase), followed by repetitions of /æ/ (maximum hold phase), and finally repetitions of /ε/ (after effects phase). Results of a formant frequency (F1) analysis indicated that the visual mismatch phases (ramp and maximum hold) caused all subjects to align their productions to the visually-presented vowel, /æ/. No subjects reported being aware that their vowel quality had changed. We conclude that the visual moving tongue stimuli produced entrainment to the viewed vowel category, rather than adaptation in the opposite direction of the perturbation. Further experimentation is needed to determine whether these effects are due to inherent imitation behaviors or subjects’ lack of agency with the tongue avatar.

DOI: 10.21437/Interspeech.2016-1594

Cite as

Katz, W.F., Prabhakaran, D. (2016) Sensorimotor Response to Visual Imagery of Tongue Displacement. Proc. Interspeech 2016, 2090-2094.

author={William F. Katz and Divya Prabhakaran},
title={Sensorimotor Response to Visual Imagery of Tongue Displacement},
booktitle={Interspeech 2016},