EUROSPEECH 2001 Scandinavia
The long term goal of our work is to predict visual confusion matrices from physical measurements. In this paper, four talkers were chosen to record 69 American-English Consonant-Vowel syllables with audio, video, and facial movements captured. During the recording, 20 markers were put on the face and an optical Qualisys system was used to track three-dimensional facial movements. The videotapes (with markers on the face and without sound) were presented to normal hearing viewers with average or above average lipreading ability, and visual confusion matrices were obtained. Results showed that the facial measurements were correlated with visual perception data by about 0.79 and account for about 63% of the variance.
Bibliographic reference. Jiang, Jintao / Alwan, Abeer / Auer, Edward T. / Bernstein, Lynne E. (2001): "Predicting visual consonant perception from physical measures", In EUROSPEECH-2001, 179-182.