Knowledge of vocal-tract (VT) length is a logical prerequisite for acoustic-to-articulatory inversion. Prior work has treated VT length estimation (VTLE) and inversion largely as separate problems. We describe a new algorithm for VTLE based on acoustic-toarticulatory inversion. Our inversion process uses the Maeda model (MM) and combines global search and dynamic programming for transforming speech waveforms into articulatory trajectories. The VTLE algorithm searches for the VT length of MM that generates the most accurate and smooth inversion result. This new algorithm was tested on samples of non-nasalized diphthongs (e.g., [ai]) synthesized with MM itself, with TubeTalker (a different VT model) and collected from children and adult speakers; its performance was compared with that from a conventional formant frequency-based method. Results of VTLE on synthesized speech indicate that the inversion-based algorithm led to greater VTLE accuracy and robustness against phonetic variation than the formant-based method. Furthermore, compared to the formantbased method, results from the inversion-based algorithm showed stronger correlation with a MRI-derived VTL measure in adults and greater consistency with formerly reported age-VTL relations in children.
Bibliographic reference. Cai, Shanqing / Bunnell, H. Timothy / Patel, Rupal (2013): "Unsupervised vocal-tract length estimation through model-based acoustic-to-articulatory inversion", In INTERSPEECH-2013, 1712-1716.