September 22-25, 1997
This paper presents methods to improve speech recognition accuracy by incorporating automatic lip reading. The paper improves lip reading accu- racy by following approaches; 1)collection of image and speech synchronous data of 5240 words, 2)feature extraction of 2-dimensional power spectra around a mouth and 3)sub-word unit HMMs with tied-mixture distribution(Tied-Mixture HMMs). Experiments through 100 word test show the performance of 85% by lipreading alone. It is also shown that tied-mixture HMMs improve the lip reading accuracy. The speech recognition experiments are carried out over various SNR integrating audio-visual information. The results show the integration always realizes better performance than that using either audio or visual information.
Bibliographic reference. Nakamura, Satoshi / Nagai, Ron / Shikano, Kiyohiro (1997): "Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database", In EUROSPEECH-1997, 1623-1626.