ISCA Archive SPECOM 2004
ISCA Archive SPECOM 2004

Comparison of 2d and 3d analysis for automated cued speech gesture recognition

Alice Caplier, Laurent Bonnaud, Sotiris Malassiotis, Michael G. Strintzis

This paper deals with the problem of the automated classification of cued speech gestures. Cued speech is a specific gesture language (different from the sign language) used for communication between deaf people and other people. It uses only 8 different hand configurations. The aim of this work is to apply a simple classifier on 3 images data sets, in order to answer two main questions: is 3D data needed, and how important is the hand segmentation quality ? The first data set consists of images acquired with a single camera in a controlled light environment and a segmentation (called “2D segmentation”) based on luminance information. The second data set is acquired with a 3D camera which can produce a depth map; a segmentation (called “3D segmentation”) of the hand configurations based on the video and the depth map is performed. The third data set consists in 3D-segmented masks where the resulting hand mask is warped to compensate for hand pose variations. For the classification purposes, hand configurations are characterized by the computation of the seven Hu moment invariants. Then a supervised classification using a multi-layer perceptron is done. The performance of classification based on 2D and 3D information are compared.

Cite as: Caplier, A., Bonnaud, L., Malassiotis, S., Strintzis, M.G. (2004) Comparison of 2d and 3d analysis for automated cued speech gesture recognition. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 35-41

  author={Alice Caplier and Laurent Bonnaud and Sotiris Malassiotis and Michael G. Strintzis},
  title={{Comparison of 2d and 3d analysis for automated cued speech gesture recognition}},
  booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)},