7th International Conference on Spoken Language Processing
September 16-20, 2002
A method for concatenative articulatory visual speech synthesis has been evaluated. The method consists in using concatenated units of articulatory parameter transitions from the middle of one phoneme to the middle of the next as input to a (3) D parametric tongue model. The units were created by segmentation of the Electromagnetic articulography (EMA) measures in a database of 460 phonetically balanced sentences collected at the University of Edinburgh. The evaluation was made against the EMA database on which the movements were based and against X-ray films of three other speakers. The results show that the model replicates the natural movements globally, but that the rare units in the concatenation database may cause large differences between the synthesized and the natural utterance and that the tongue root and tongue tip movements are too restricted in the model.
Bibliographic reference. Engwall, Olov (2002): "Evaluation of a system for concatenative articulatory visual speech synthesis", In ICSLP-2002, 665-668.