Animations of sign language can increase the accessibility of information for people who are deaf or hard of hearing (DHH), but prior work has demonstrated that accurate non-manual expressions (NMEs), consisting of face and head movements, are necessary to produce linguistically accurate animations that are easy to understand. When synthesizing animation, given a sequence of signs performed on the hands (and their timing), we must select an NME performance. Given a corpus of facial motion-capture recordings of ASL sentences with annotation of the timing of signs in the recording, we investigate methods (based on word count and on delexicalized sign timing) for selecting the best NME recoding to use as a basis for synthesizing a novel animation. By comparing recordings selected using these methods to a gold-standard recording, we identify the top-performing exemplar selection method for several NME categories.
Cite as: Kacorri, H., Huenerfauth, M. (2016) Selecting Exemplar Recordings of American Sign Language Non-Manual Expressions for Animation Synthesis Based on Manual Sign Timing. Proc. 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016), 14-19, doi: 10.21437/SLPAT.2016-3
@inproceedings{kacorri16_slpat, author={Hernisa Kacorri and Matt Huenerfauth}, title={{Selecting Exemplar Recordings of American Sign Language Non-Manual Expressions for Animation Synthesis Based on Manual Sign Timing}}, year=2016, booktitle={Proc. 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016)}, pages={14--19}, doi={10.21437/SLPAT.2016-3} }