The ESCA Workshop on Speech Synthesis

September 25-28, 1990
Autrans, France

Classification of Lip-Shapes and Their Association with Acoustic Speech Events

N. Michael Brooke, Paul D. Templeton

School of Mathematical Sciences, University of Bath, Bath, UK

Digital image-processing techniques permit capture and analysis of the perceptually-informative visual speech cues presented by the mouth region of speakers. These may be used to enhance automatic speech recognition, particularly where there is acoustic noise. Conversely, visual speech syntheses could be used, for example, to improve speech intelligibility on low-bandwidth channels such as telephones. One way to create visual displays efficiently might be to generate image sequences from a codebook of mouth shapes. To assess the potential of this approach to synthesis, an initial experiment has been performed in which four selected parameters have been used to cluster mouth images captured at the nuclei of eleven, non-diphthongal, British-English vowels enunciated in a /bVb/ context. The association between the clusters and specific vowel productions was investigated. A multi-layer perceptron (MLP) has also been applied to assess the visual distinctiveness of the vowels.

