Acoustic-to-articulatory inversion for vowels is performed by cepstral analysis-by-synthesis, using chain-matrix calculation of vocal tract (VT) acoustics and the Maeda articulatory model. The derivative of the VT chain matrix with respect to the area function was calculated in a novel efficient manner, and used in the BFGS quasi- Newton method for optimizing a distance measure between input and synthesized cepstral features over the entire articulatory trajectory. The optimization is initialized by a fast search of an articulatory codebook with a bin structure in formant space and the cost function also includes regularization and continuity terms to obtain realistic inverted VT shapes and smooth articulatory trajectories. Inversion is evaluated on the three diphthongs /ai/, /oi/ and /au/ of two speakers, one male and one female, from the University of Wisconsin X-ray microbeam (XRMB) database, and good agreement was achieved between inverted midsagittal vocal tract outlines and measured XRMB tongue and lip pellet positions, with an average relative error of less than 3% in the first three formants.
Bibliographic reference. Panchapagesan, Sankaran / Alwan, Abeer (2008): "Vocal tract inversion by cepstral analysis-by-synthesis using chain matrices", In INTERSPEECH-2008, 2857-2860.