Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Bimodal Speech Recognition Using Coupled Hidden Markov Models

Stephen M. Chu, Thomas S. Huang

Beckman Institute and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA

In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences between their hidden state variables. The coupling probabilities are both cross chain and cross time. The later is essential for allowing temporal influences between chains, which is important in modeling bimodal speech. Our bimodal speech recognition system employs a two-chain CHMM, with one chain being associated with the acoustic observations, the other with the visual features. A deterministic approximation for maximum a posteriori (MAP) estimation is used to enable fast classification and parameter estimation. We evaluated the system on a speaker independent connected-digit task. Comparing with an acoustic-only ASR system trained using only the audio channel of the same database, the bimodal system consistently demonstrates improved noise robustness at all SNRs. We further compare the CHMM system reported in this paper with our earlier bimodal speech recognition system in which the two modalities are fused by concatenating the audio and visual features. The recognition results clearly show the advantages of the CHMM framework in the context of bimodal speech recognition.

Full Paper

Bibliographic reference.  Chu, Stephen M. / Huang, Thomas S. (2000): "Bimodal speech recognition using coupled hidden Markov models", In ICSLP-2000, vol.2, 747-750.