Introducing visual cues in acoustic-to-articulatory inversion

Olov Engwall

The contribution of facial measures in a statistical acoustic-toarticulatory inversion has been investigated. The tongue contour was estimated using a linear estimation from either acoustics or acoustics and facial measures. Measures of the lateral movement of lip corners and the vertical movement of the upper and lower lip and the jaw gave a substantial improvement over the audio-only case. It was further found that adding the corresponding articulatory measures that could be extracted from a profile view of the face; i.e. the protrusion of the lips, lip corners and the jaw, did not give any additional improvement of the inversion result. The present study hence suggests that audiovisual-to-articulatory inversion can as well be performed using front view monovision of the face, rather than stereovision of both the front and profile view.

doi: 10.21437/Interspeech.2005-846

Cite as: Engwall, O. (2005) Introducing visual cues in acoustic-to-articulatory inversion. Proc. Interspeech 2005, 3205-3208, doi: 10.21437/Interspeech.2005-846

