FAAVSP - The 1st Joint Conference on
Facial Analysis, Animation, and
We know that an audio speech signal can be unambiguously decoded by any native speaker of the language it is uttered in, provided that it meets some quality conditions. But we do not know if this is the case with visual speech, because the process of lipreading is rather mysterious and seems to rely heavily on the use of context and non-speech cues. How much information about the speech content is there in a visual speech signal? We make some attempt to provide an answer to this question by discovering matching segments of phoneme sequences that represent recurring words and phrases in audio and visual representations of the same speech. We use a modified version of the technique of segmental dynamic programming that was introduced by Park and Glass. Comparison of the results shows that visual speech displays rather less matching content than the audio, and reveals some interesting differences in the phonetic content of the information recovered by the two modalities. Index Terms: automatic lip reading, visual speech processing, speech recognition
Bibliographic reference. Cox, Stephen (2015): "Discovering patterns in visual speech", In FAAVSP-2015, 121-126.