AuditoryVisual Speech Processing

We present three models of audiovisual speech perception at varying signaltonoise ratios (SNR). The first model is Massaro's Fuzzy Logical Model of Perception (FLMP)1 applied at each SNR. The second model imposes the constraint that the visual response probabilities are the same regardless of the SNR. Both models describe the data well. Root Mean Squared Error (RMSE) corrected for the numbers of degrees of freedom was smaller for the latter model. In concordance, crossvalidated paired ttest showed that the latter model was significantly better at predicting individual performance despite the lower number of parameters. In a third model  a weighted FLMP  the SNR is parameterized reducing the number of free parameters substantially. This model fits the data significantly worse than the other two models, but does capture salient features of the change in performance with varying SNR.
Bibliographic reference. Andersen, T.S. / Tiippana, K. / Lampinen, J. / Sams, M. (2001): "Modeling of audiovisual speech perception in noise", In AVSP2001, 172176.