Auditory-Visual Speech Processing (AVSP'99)
August 7-10, 1999
This paper examines the degree of correlation between lip and jaw configuration and speech acoustics. The lip and jaw positions are characterised by a system of measurements taken from video images of the speaker's face and profile, and the acoustics are represented using line spectral pair parameters and a measure of RMS energy. A correlation is found between the measured acoustic parameters and a linear estimate of the acoustics recovered from the visual data. This correlation exists despite the simplicity of the mapping and is in rough agreement with correlations measured in earlier work by Yehia et al. The linear estimates are also compared to estimates made using nonlinear models. In particular it is shown that although performance of the two models is remarkably similar for static visual features, non-linear models are better able to handle dynamic features.
Bibliographic reference. Barker, J. P. / Berthommier, F. (1999): "Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models", In AVSP-1999, paper #19.