One of the most daunting tasks of a listener is to map a continuous
auditory stream onto known speech sound categories and lexical items.
A major issue with this mapping problem is the variability in the acoustic
realizations of sound categories, both within and across speakers.
Past research has suggested listeners may use visual information (e.g.,
lip-reading) to calibrate these speech categories to the current speaker.
Previous studies have focused on audiovisual recalibration of consonant
categories. The present study explores whether vowel categorization,
which is known to show less sharply defined category boundaries, also
benefit from visual cues.
Participants were
exposed to videos of a speaker pronouncing one out of two vowels, paired
with audio that was ambiguous between the two vowels. After exposure,
it was found that participants had recalibrated their vowel categories.
In addition, individual variability in audiovisual recalibration is
discussed. It is suggested that listeners’ category sharpness
may be related to the weight they assign to visual information in audiovisual
speech perception. Specifically, listeners with less sharp categories
assign more weight to visual information during audiovisual speech
recognition.
Cite as: Franken, M.K., Eisner, F., Schoffelen, J.-M., Acheson, D.J., Hagoort, P., McQueen, J.M. (2017) Audiovisual Recalibration of Vowel Categories. Proc. Interspeech 2017, 655-658, doi: 10.21437/Interspeech.2017-122
@inproceedings{franken17_interspeech, author={Matthias K. Franken and Frank Eisner and Jan-Mathijs Schoffelen and Daniel J. Acheson and Peter Hagoort and James M. McQueen}, title={{Audiovisual Recalibration of Vowel Categories}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={655--658}, doi={10.21437/Interspeech.2017-122} }