Auditory-Visual Speech Processing
(AVSP 2001)

September 7-9, 2001
Aalborg, Denmark

Modeling of Audiovisual Speech Perception in Noise

T.S. Andersen, K. Tiippana, J. Lampinen, M. Sams

Laboratory of Computational Engineering, Helsinki University of Technology, Finland

We present three models of audiovisual speech perception at varying signal-to-noise ratios (SNR). The first model is Massaro's Fuzzy Logical Model of Perception (FLMP)1 applied at each SNR. The second model imposes the constraint that the visual response probabilities are the same regardless of the SNR. Both models describe the data well. Root Mean Squared Error (RMSE) corrected for the numbers of degrees of freedom was smaller for the latter model. In concordance, cross-validated paired t-test showed that the latter model was significantly better at predicting individual performance despite the lower number of parameters. In a third model - a weighted FLMP - the SNR is parameterized reducing the number of free parameters substantially. This model fits the data significantly worse than the other two models, but does capture salient features of the change in performance with varying SNR.


Full Paper

Bibliographic reference.  Andersen, T.S. / Tiippana, K. / Lampinen, J. / Sams, M. (2001): "Modeling of audiovisual speech perception in noise", In AVSP-2001, 172-176.