Auditory-Visual Speech Processing 2007 (AVSP2007)
Kasteel Groenendaal, Hilvarenbeek, The Netherlands
This paper presents a multimodal person identification system based on combination of audio and visual classifiers. The audio classifier was built by using mel-frequency cepstrum coefficient features and Gaussian mixture models. The visual classifier was implemented by Haar-like features and AdaBoost algorithm for face detection, and principal component analysis for identification. A new method is proposed to estimate the optimal weighting parameter based on probability density function estimation under Gaussian assumptions. Simulations indicate that the proposed method obtains slightly better results than the frequently-used empirical method of optimising on held-out training data.
Bibliographic reference. Hu, Roland / Damper, Robert I. (2007): "Audio-visual person identification on the XM2VTS database", In AVSP-2007, paper L5-3.