Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Perceptual Interfaces for Information Interaction: Joint Processing of Audio and Visual Information for Human-Computer Interaction
Chalapathi Neti, Giridharan Iyengar, Gerasimos Potamianos, A. Senior, B. Maison
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
We are exploiting the human perceptual principle of sensory
integration (the joint use of audio and visual information)
to improve the recognition of human activity (speech recognition,
speech event detection and speaker change), intent (intent to
speak) and human identity (speaker recognition), particularly in
the presence of acoustic degradation due to noise and channel. In
this paper, we present experimental results in a variety of contexts
that demonstrate the benefit of joint audio-visual processing.
Neti, Chalapathi / Iyengar, Giridharan / Potamianos, Gerasimos / Senior, A. / Maison, B. (2000):
"Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction",
In ICSLP-2000, vol.3, 11-14.