INTERSPEECH 2004 - ICSLP
The strong association existing between the audio speech features and the state of mouth opening is exploited for inversion in a comparative framework, using linear and non linear models. At first, an associative map between an array of visemes and the audio features is constructed following a statistical learning process. The visemic mapping is self-organized and after convergence, the conditional mean of audio features is associated to each of them. Since the viseme states form a 2-dimensional continuum, the principle of the non linear inversion models is to drive a continuous trajectory across the output space, using less continuous audio inputs. Two strategies are proposed in order to smooth the output sequence. The first one consists in filtering (reshaping) the input trajectory, and the second one is the driving of a traveling wave. A comparative study including linear and non linear models shows that the second strategy is plausible for modeling an associative cortical function.
Bibliographic reference. Berthommier, Frédéric (2004): "Comparative study of linear and non-linear models for viseme in version: modeling of a cortical associative function", In INTERSPEECH-2004, 2517-2520.