Syllabic tone perception in Vietnamese

Hansjörg Mixdorff, Mai Chi Lirong, Dung Tien Nguyen, Denis Burnham

This paper discusses tone perception in Vietnamese under audio only and audio plus video conditions. The lower part of a male speaker's face was recorded on digital video as he uttered 22 sets of syllabic tokens covering the six different tones of Vietnamese. In a perception study the audio signal alone, as well as audio plus video conditions were presented to native Vietnamese speakers who were required to decide which tone they perceived. Audio was presented in various conditions: clear and babble-noise masked at different SNR levels, as well as a devoiced noise condition using LPC resynthesis. In the devoiced and the clear audio conditions, there is little augmentation of audio alone due to the addition of video. However, the addition of visual information did significantly improve perception in the babble-noise masked condition at an SNR of -9 dB. This outcome suggests that the improvement in noise-masked conditions is not due to additional information in the video per se, but rather to an effect of early integration of acoustic and visual cues facilitating auditory-visual speech perception.

