FAAVSP - The 1st Joint Conference on
Facial Analysis, Animation, and
In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual infor- mation is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recogni- tion, a constraint in which the summation of audio and visual weight factors should be one is employed. This means bal- ance between transition and observation probabilities of HMM is fixed. We study an effective weight estimation indicator when releasing the constraint. Recognition experiments were conducted using an audio-visual corpus CENSREC-1-AV . In noisy environments, effectiveness of deactivating the con- straint is clarified for improving recognition accuracy. Sub- sequently higher-order statistical parameter (kurtosis) based stream weights were proposed and tested. Through recognition experiments, it is found proposed stream weights are successful. Index Terms: stream weight optimization, multi-modal speech recognition, kurtosis, multi-stream HMM.
Bibliographic reference. Ukai, Kazuto / Tamura, Satoshi / Hayamizu, Satoru (2015): "Stream weight estimation using higher order statistics in multi-modal speech recognition", In FAAVSP-2015, 181-184.