ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2008)
Recovering the motions of speech articulators from the acoustic speech signal has a long history, starting from the observation that a simple concatenated tube model is a reasonable model for the origin of formant resonances. In this work, we take a different approach making minimal assumptions about the interdependence of acoustics and articulators by estimating the full joint distribution of the two spaces based on a corpus of paired data, derived from an articulatory synthesizer. This approach allows us to estimate posterior distributions of articulator state as well as finding the maximum-likelihood trajectories. We present examples comparing this approach to a related, earlier approach that did not incorporate prior distributions over articulator space, and demonstrate the advantages of learning the models from realistic utterances. We also indicate benefits available from jointly estimating particular pairs of articulators that have high mutual dependence.
Bibliographic reference. Lammert, Adam / Ellis, Daniel P. W. / Divenyi, Pierre (2008): "Data-driven articulatory inversion incorporating articulator priors", In SAPA-2008, 29-34.