Auditory-Visual Speech Processing 2007 (AVSP2007)
Kasteel Groenendaal, Hilvarenbeek, The Netherlands
Audio-visual speech recognition is used to make speech recognition more robust in case when the acoustic speech signal is affected by increased environmental noise. Since visual speech expression is not affected by the acoustic noise, it is used as supplemental information for the recognition. Usually, the shape of lips of a speaker is described by either pixel-based or shape-based features. Many works were published on these two basic approaches.
After experiments with various parameterizations we decided to use expert knowledge about human lip-reading for development of a new parameterization. We used information provided by human lip-reading experts and speech therapists. Based on this information we designed a combined parameterization for description of visual speech part. Experiments with this parameterization were performed on two different databases: English audio-visual database XM2VTS and Czech audio-visual database UWB-05- HSCAVC.
The designed parameterization combines both basic approaches and uses shape-based description for outer lip contour and pixel-based description for description of inner part of a mouth. Results obtained in experiments with this parameterization showed that it outperforms the traditionally used parameterizations.
Bibliographic reference. Císar, Petr / Zelezný, Milos / Zelinka, Jan / Trojanová, Jana (2007): "Development and testing of new combined visual speech parameterization", In AVSP-2007, paper P31.