Speech is conventionally regarded as a purely acoustic signal. Consequently, speech research has been devoted largely to unimodal investigations of its audible component. However, speech has a second component; the movements of speakers' faces associated with the production of the acoustic speech signal constitute a visible speech signal that can also convey important perceptual cues. The auditory and visual cues tend to be complementary. For example, cues to the place of articulation are frequently easy to distinguish visually and difficult to distinguish acoustically, whilst cues to the manner of articulation are often easy to distinguish acoustically and difficult to distinguish visually. It has been shown that seeing the face of a talker as well as hearing the talker's voice can significantly improve speech intelligibility, particularly in a noisy environment. For the normal-hearing, noisy environments correspond to the permanent situation facing the large number of people who suffer from some degree of hearing-impairment. In both cases, the everyday business of speech communication is dependent (and in the latter case, critically dependent) upon an ability to supplement what little can be heard with what can be seen. This is the basis of the skill of lipreading, the acquisition of which can play a central part in the rehabilitation of the hearing-impaired.
Cite as: Brooke, N.M. (1991) Processing facial images to enhance speech communication. Proc. 2nd VENACO Workshop - The Structure of Multimodal Dialogue, 155-158
@inproceedings{brooke91_smmd, author={N. Michael Brooke}, title={{Processing facial images to enhance speech communication}}, year=1991, booktitle={Proc. 2nd VENACO Workshop - The Structure of Multimodal Dialogue}, pages={155--158} }