Interspeech'2005 - Eurospeech
This paper presents results concerning the exploitation of visual cues in the perception of Mandarin tones. The lower part of a female speaker's face was recorded on digital video as she uttered 25 sets of syllabic tokens covering the four different tones of Mandarin. Then in a perception study the audio sound track alone, as well an audio plus video condition were presented to native Mandarin speakers who were required to decide which tone they perceived. Audio was presented in various conditions: clear, babble-noise masked at different SNR levels, as well as devoiced and amplitude-modulated noise conditions using LPC resynthesis. In the devoiced and the clear audio conditions, there is little augmentation of audio alone due to the addition of video. However, the addition of visual information did significantly improve perception in the babble-noise masked condition, and this effect increased with decreasing SNR. This outcome suggests that the improvement in noise-masked conditions is not due to additional information in the video per se, but rather to an effect of early integration of acoustic and visual cues facilitating auditory-visual speech perception.
Bibliographic reference. Mixdorff, Hansjörg / Hu, Yu / Burnham, Denis (2005): "Visual cues in Mandarin tone perception", In INTERSPEECH-2005, 405-408.