ISCA Archive AVSP 2001
ISCA Archive AVSP 2001

Modeling of audiovisual speech perception in noise

T.S. Andersen, K. Tiippana, J. Lampinen, M. Sams

We present three models of audiovisual speech perception at varying signal-to-noise ratios (SNR). The first model is Massaro's Fuzzy Logical Model of Perception (FLMP)1 applied at each SNR. The second model imposes the constraint that the visual response probabilities are the same regardless of the SNR. Both models describe the data well. Root Mean Squared Error (RMSE) corrected for the numbers of degrees of freedom was smaller for the latter model. In concordance, cross-validated paired t-test showed that the latter model was significantly better at predicting individual performance despite the lower number of parameters. In a third model - a weighted FLMP - the SNR is parameterized reducing the number of free parameters substantially. This model fits the data significantly worse than the other two models, but does capture salient features of the change in performance with varying SNR.

Cite as: Andersen, T.S., Tiippana, K., Lampinen, J., Sams, M. (2001) Modeling of audiovisual speech perception in noise. Proc. Auditory-Visual Speech Processing, 172-176

  author={T.S. Andersen and K. Tiippana and J. Lampinen and M. Sams},
  title={{Modeling of audiovisual speech perception in noise}},
  booktitle={Proc. Auditory-Visual Speech Processing},