Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Perceptual Interfaces for Information Interaction: Joint Processing of Audio and Visual Information for Human-Computer Interaction

Chalapathi Neti, Giridharan Iyengar, Gerasimos Potamianos, A. Senior, B. Maison

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

We are exploiting the human perceptual principle of sensory integration (the joint use of audio and visual information) to improve the recognition of human activity (speech recognition, speech event detection and speaker change), intent (intent to speak) and human identity (speaker recognition), particularly in the presence of acoustic degradation due to noise and channel. In this paper, we present experimental results in a variety of contexts that demonstrate the benefit of joint audio-visual processing.

Full Paper

Bibliographic reference.  Neti, Chalapathi / Iyengar, Giridharan / Potamianos, Gerasimos / Senior, A. / Maison, B. (2000): "Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction", In ICSLP-2000, vol.3, 11-14.