ITRW on Non-Linear Speech Processing
(NOLISP 07)

Paris, France
May 22-25, 2007

Trajectory Mixture Density Network with Multiple Mixtures for Acoustic-articulatory Inversion

Korin Richmond

Centre for Speech Technology Research, Edinburgh University, Edinburgh, UK

We have previously proposed a trajectory model which is based on a mixture density network trained with target variables augmented with dynamic features together with an algorithm for estimating maximum likelihood trajectories which respects the constraints between those features. In this paper, we have extended that model to allow diagonal covariance matrices and multiple mixture components. We have evaluated the model on an inversion mapping task and found the trajectory model works well, outperforming smoothing of equivalent trajectories using low-pass filters. Increasing the number of mixture components in the TMDN improves results further.

Full Paper

Bibliographic reference.  Richmond, Korin (2007): "Trajectory mixture density network with multiple mixtures for acoustic-articulatory inversion", In NOLISP-2007, 67-70.