Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Visual Lipreading of Voicing for French Stop Consonants

C. Colin (1), Monique Radeau (1,2), Didier Demolin (1), A. Soquet (1)

(1) Free University of Brussels, Belgium
(2) National Fund for Scientific Research, Brussels, Belgium

This study examined whether visually presented bilabials consonants are better identified than velars in a CV (C = consonant; V = vowel) or VCV context. We also investigated whether voiced and voiceless consonants sharing a same place and manner of articulation could be differentiated from each other with visual cues only. Although it is generally assumed that voicing is mainly mediated by the auditory modality, one cannot discard the possibility that the production of a voiced stop consonant produces a pattern of facial cues that could be detectable visually. Two pairs of stop consonants (/b/-/p/ and /g/-/k/) were articulated by a man and by a woman speaker in two syllabic contexts (CV monosyllables or VCV bisyllables). The bisyllables were uttered according to three speaking rates: slow, medial and fast. The materials were edited on a videotape and presented on a TV screen without sound. After each trial, participants had to choose between several written possibilities what they had perceived. Percentage of correct identifications reached 42% on average for the four consonants. Errors mostly consisted in voicing confusions (37%). Place of articulation confusions occurred in only 8% of the cases. Correct identifications were more numerous for bilabials than for velars but more particularly for monosyllables. Voiced consonants were better identidified than voiceless in both syllabic contexts, but especially for velars. This suggests that some voicing distinction is possible on the basis of visual cues.

