![]() |
International Workshop on Hands-Free Speech Communication (HSC2001)April 9-11, 2001 |
![]() |
There have been higher demands recently for Automatic Speech Recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to effectively integrate audio and visual information in audiovisual (bi-modal) ASR Systems. For such integration, the following issues are important: (1) The synchronization of the audio and visual information, and (2) The optimization of a system in its environment. In (1), the individual feature of the speech and lip movements has the time lag, and has the correlation. Firstly, to address this problem, we introduce an integration method using HMM composition. Secondly, we examine whether the GPD algorithm can adaptively estimate the stream weights. Evaluation experiments show that the proposed ASR system improves the recognition accuracy of Audio only, Visual only and conventional audio-visual ASR systems for noisy speech.
Bibliographic reference. Kumatani, Kennichi / Nakamura, Satoshi / Shikano, Kiyohiro (2001): "An adaptive integration method based on product HMM for bi-modal speech recognition", In HSC2001, 195-198.