Auditory-Visual Speech Processing (AVSP) 2013

Annecy, France
August 29 - September 1, 2013

Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure

Utpala Musti, Vincent Colotte, Slim Ouni, Caroline Lavecchia, Brigitte Wrobel-Dautcourt, Marie-Odile Berger

Université de Lorraine, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, France

We present an iterative algorithm for automatic feature selection and weight tuning of target cost in the context of unit selection based audio-visual speech synthesis. We perform feature selection and weight tuning for a given unit-selection corpus to make the ranking given by the target cost function consistent with the ordering given by an objective dissimilarity measure. We explicitly perform feature elimination to reduce the redundancy and noise in target cost calculation based on an objective metric. Finding an objective metric highly correlated to perception should improve the quality of tuning. This is the purpose of the second part where we are making an attempt to such goal. Firstly, we present the human-centered evaluation done of the synthesized audio-visual speech and secondly, its preliminary analysis in relation to the objective evaluation metrics. This analysis of correlation between objective and subjective evaluation results shows interesting patterns which might help in designing better tuning metrics and objective evaluation techniques. The key point is to find a link between objective and perceptual measures.

Index Terms: Unit selection, audio-visual speech synthesis, target cost, target feature selection, weight tuning

Full Paper

Bibliographic reference.  Musti, Utpala / Colotte, Vincent / Ouni, Slim / Lavecchia, Caroline / Wrobel-Dautcourt, Brigitte / Berger, Marie-Odile (2013): "Automatic feature selection for acoustic-visual concatenative speech synthesis: towards a perceptual objective measure", In AVSP-2013, 175-180.