![]() |
International Workshop on Hands-Free Speech Communication (HSC2001)April 9-11, 2001 |
![]() |
This paper proposes bimodal speech recognition using lip movements measured by optical-flow analysis. The optical flow is defined as the distribunon of apparent velocities of bnghtness pattern movements. Since the optical flow can be computed without extracting the speaker's lip contours and location, robust visual features can be obtained on lip movements. Our method calculates two visual features in each frame: variances of horizontal and vertical components of flow velocities. Since these features represent movement of the speaker's mouth, they are especially useful for estimating pause/silence periods in noise-corrupted speech. The visual features are combined with acoustic features in the framework of HMM-based recognition. Phoneme HMMs are trained using the combined features extracted from clean speech data. In recognizing noise-corrupted speech, the observation probability of visual features are weighted. Experiments have been carried out using audio-visual data by 11 male speakers uttering connected digits. The following improvements of word accuracy over the audio-only recognition scheme were achieved by combining visual information only for silence HMM; 5% at SNR=SdB and 12% at SNR=lOdB.
Bibliographic reference. Iwano, Koji / Tamura, Satoshi / Furui, Sadaoki (2001): "Bimodal speech recognition using lip movement measured by optical-flow analysis", In HSC2001, 187-190.