Auditory-Visual Speech Processing 2005
British Columbia, Canada
This paper presents results concerning the use of visual cues in the perception of Thai tones. The lower part of a female speaker's face was recorded on digital video as she uttered 24 sets of syllabic tokens covering the five different tones of Thai. A perception study was conducted in which audio sound track alone; as well as audio plus video were presented to native Thai speakers who were required to decide which tone they perceived. Audio was presented in various conditions: clear, pink-noise masked at different SNR levels, and devoiced conditions using LPC resynthesis. Some subjects were presented with a video only, silent condition. In the devoiced and the clear audio conditions, there was little augmentation due to the addition of video. However, the addition of visual information significantly improved perception in the pink-noise masked condition, and the effect increased with decreasing SNR. Results on video only are close to chance suggesting that the improvement in noise-masked conditions is not due to additional information in the video per se, but rather to an effect of early integration of acoustic and visual cues facilitating auditory-visual speech perception.
Bibliographic reference. Mixdorff, Hansjörg / Charnvivit, Patavee / Burnham, Denis K. (2005): "Auditory-visual perception of syllabic tones in Thai", In AVSP-2005, 3-8.