ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights

Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura

This paper presents a framework for designing a hidden Markov model (HMM)-based audio-visual automatic speech recognition (ASR) system based on minimum classification error training. Audio/visual HMM parameters are optimized with the generalized probabilistic descent (GPD) method, and their likelihoods are combined using model-dependent stream weights which are also estimated with the GPD method. Experimental results of speaker independent isolated word recognition show that the audiovisual ASR performance is significantly improved by the GPD optimization of audio and visual HMMs and the introduction of model-dependent stream weights, resulting in 47%– 81% error reduction over a conventional system which consists of HMMs trained based on the maximum likelihood criterion and globally-tied stream weights estimated with the GPD method.


Cite as: Miyajima, C., Tokuda, K., Kitamura, T. (2000) Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 1023-1026

@inproceedings{miyajima00_icslp,
  author={Chiyomi Miyajima and Keiichi Tokuda and Tadashi Kitamura},
  title={{Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 2, 1023-1026}
}